Commit 00d46b80 authored by Shuhei Matsumoto's avatar Shuhei Matsumoto Committed by Tomasz Zawadzki
Browse files

bdev/nvme: Disable automatic failback in multipath mode



By default, failback to the preferred I/O path is done automatically
if it is restored. Some users may want to keep using the backup I/O
path even if the preferred I/O path is restored. In this case,
bdev_nvme_set_preferred_path can be used to do manual failback.

We may be able to clear/fill I/O path cache more strictly but it will
be complicated and have bugs. This patch does the minimal change,
just skips an apparent case.

Signed-off-by: default avatarShuhei Matsumoto <smatsumoto@nvidia.com>
Change-Id: I78fe5faee6ff04e88ae3d7c6be6da1c20637c912
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/12431


Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: default avatarSPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: default avatarJim Harris <james.r.harris@intel.com>
Reviewed-by: default avatarBen Walker <benjamin.walker@intel.com>
parent b0ab5524
Loading
Loading
Loading
Loading
+3 −0
Original line number Diff line number Diff line
@@ -29,6 +29,9 @@ when in multipath mode. This RPC does not support NVMe bdevs in failover mode.
A new RPC `bdev_nvme_set_multipath_policy` was added to set multipath policy of a NVMe bdev
in multipath mode.

A new option `disable_auto_failback` was added to the `bdev_nvme_set_options` RPC to disable
automatic failback.

### idxd

A new parameter `flags` was added to all low level submission and preparation
+1 −0
Original line number Diff line number Diff line
@@ -2963,6 +2963,7 @@ transport_ack_timeout | Optional | number | Time to wait ack until ret
ctrlr_loss_timeout_sec     | Optional | number      | Time to wait until ctrlr is reconnected before deleting ctrlr.  -1 means infinite reconnects. 0 means no reconnect.
reconnect_delay_sec        | Optional | number      | Time to delay a reconnect trial. 0 means no reconnect.
fast_io_fail_timeout_sec   | Optional | number      | Time to wait until ctrlr is reconnected before failing I/O to ctrlr. 0 means no such timeout.
disable_auto_failback      | Optional | boolean     | Disable automatic failback. The RPC bdev_nvme_set_preferred_path can be used to do manual failback.

#### Example

+4 −1
Original line number Diff line number Diff line
@@ -58,7 +58,10 @@ For active-passive policy, each I/O channel for an NVMe bdev has a cache to stor
I/O path which is connected and optimal from ANA and use it for I/O submission. Some users may want
to specify the preferred I/O path manually. They can dynamically set the preferred I/O path using
the `bdev_nvme_set_preferred_path` RPC. Such assignment is realized naturally by moving the
I/O path to the head of the I/O path list.
I/O path to the head of the I/O path list. By default, if the preferred I/O path is restored,
failback to it is done automatically. The automatic failback can be disabled by a global option
`disable_auto_failback`. In this case, the `bdev_nvme_set_preferred_path` RPC can be used
to do manual failback.

The active-active policy uses the round-robin algorithm and submits an I/O to each I/O path in
circular order.
+12 −4
Original line number Diff line number Diff line
@@ -145,6 +145,7 @@ static struct spdk_bdev_nvme_opts g_opts = {
	.ctrlr_loss_timeout_sec = 0,
	.reconnect_delay_sec = 0,
	.fast_io_fail_timeout_sec = 0,
	.disable_auto_failback = false,
};

#define NVME_HOTPLUG_POLL_PERIOD_MAX			10000000ULL
@@ -1415,7 +1416,9 @@ bdev_nvme_create_qpair(struct nvme_qpair *nvme_qpair)

	nvme_qpair->qpair = qpair;

	if (!g_opts.disable_auto_failback) {
		_bdev_nvme_clear_io_path_cache(nvme_qpair);
	}

	return 0;

@@ -3601,14 +3604,19 @@ _bdev_nvme_set_preferred_path(struct spdk_io_channel_iter *i)
		prev = io_path;
	}

	if (io_path != NULL && prev != NULL) {
	if (io_path != NULL) {
		if (prev != NULL) {
			STAILQ_REMOVE_AFTER(&nbdev_ch->io_path_list, prev, stailq);
			STAILQ_INSERT_HEAD(&nbdev_ch->io_path_list, io_path, stailq);
		}

		/* We can set io_path to nbdev_ch->current_io_path directly here.
		 * However, it needs to be conditional. To simplify the code,
		 * just clear nbdev_ch->current_io_path and let find_io_path()
		 * fill it.
		 *
		 * Automatic failback may be disabled. Hence even if the io_path is
		 * already at the head, clear nbdev_ch->current_io_path.
		 */
		nbdev_ch->current_io_path = NULL;
	}
+1 −0
Original line number Diff line number Diff line
@@ -276,6 +276,7 @@ struct spdk_bdev_nvme_opts {
	int32_t ctrlr_loss_timeout_sec;
	uint32_t reconnect_delay_sec;
	uint32_t fast_io_fail_timeout_sec;
	bool disable_auto_failback;
};

struct spdk_nvme_qpair *bdev_nvme_get_io_qpair(struct spdk_io_channel *ctrlr_io_ch);
Loading