Commit 223bffbb authored by Jim Harris's avatar Jim Harris Committed by Tomasz Zawadzki
Browse files

nvme: don't try to enumerate PCI if just received a remove event



There is a deadlock condition that happens when hot-removing
an SR-IOV enabled SSD with the bdev/nvme hotplug poller enabled.

Thread A is handling the remove event in the kernel. This could
be userspace writing '1' to the pci sysfs 'remove' entry. It holds
the removed device's lock, writes to the vfio notifier eventfd,
and waits for its completion.

DPDK receives the notification on the interrupt thread, and passes
it along to SPDK. SPDK just marks its version of the device's
struct as having been removed.

Then the SPDK bdev nvme hotplug poller runs, calling
spdk_nvme_probe_async() which calls into nvme_pcie_ctrlr_scan().

nvme_pcie_ctrlr_scan() first checks for events from the SPDK pci
layer, and gets the removal notification. But it does not immediate
detach from the PCIe device, it calls the upper layer's remove_cb
which will allow the upper layer to do any necessary cleanup and
detach from the controller itself. This means we do not respond
to the vfio removal request yet, and that the device lock is still
held by thread A.

But then we do an spdk_pci_enumerate(), which triggers DPDK to
do a full sysfs scan of all of the PCI devices. It will read
the sriov_numvfs sysfs entry if it exists (which it does for an
SR-IOV SSD), but the sriov_numvfs_show() function in the kernel
tries to acquire the device_lock() and has to wait. But this
thread is also the one that will eventually process the
remove_cb and detach the controller, which it cannot do now,
so thread A will remain waiting for the completion notification
and never release the device_lock().

So workaround for now is to skip the spdk_pci_enumerate() if
we have detected a removal event. This gives us at least one
bdev nvme hotplug poller period to get the controller detached
and release the vfio handle before trying to spdk_pci_enumerate()
again.

Fixes issue #3205

Change-Id: I1ccf7e0077479141e3d73efc629fa565428bf219
Signed-off-by: default avatarJim Harris <jim.harris@samsung.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/20987


Reviewed-by: default avatarMichal Berger <michal.berger@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: default avatarTomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: default avatarMichael Haeuptle <michaelhaeuptle@gmail.com>
Reviewed-by: default avatarKonrad Sztyber <konrad.sztyber@intel.com>
Tested-by: default avatarSPDK CI Jenkins <sys_sgci@intel.com>
parent db97955b
Loading
Loading
Loading
Loading
+9 −2
Original line number Diff line number Diff line
@@ -103,6 +103,7 @@ _nvme_pcie_hotplug_monitor(struct spdk_nvme_probe_ctx *probe_ctx)
{
	struct spdk_nvme_ctrlr *ctrlr, *tmp;
	struct spdk_pci_event event;
	int rc = 0;

	if (g_spdk_nvme_driver->hotplug_fd >= 0) {
		while (spdk_pci_get_event(g_spdk_nvme_driver->hotplug_fd, &event) > 0) {
@@ -124,6 +125,7 @@ _nvme_pcie_hotplug_monitor(struct spdk_nvme_probe_ctx *probe_ctx)
		pctrlr = nvme_pcie_ctrlr(ctrlr);
		if (spdk_pci_device_is_removed(pctrlr->devhandle)) {
			do_remove = true;
			rc = 1;
		}

		if (do_remove) {
@@ -137,7 +139,7 @@ _nvme_pcie_hotplug_monitor(struct spdk_nvme_probe_ctx *probe_ctx)
			}
		}
	}
	return 0;
	return rc;
}

static volatile void *
@@ -867,7 +869,12 @@ nvme_pcie_ctrlr_scan(struct spdk_nvme_probe_ctx *probe_ctx,

	/* Only the primary process can monitor hotplug. */
	if (spdk_process_is_primary()) {
		_nvme_pcie_hotplug_monitor(probe_ctx);
		if (_nvme_pcie_hotplug_monitor(probe_ctx) > 0) {
			/* Some removal events were received. Return immediately, avoiding
			 * an spdk_pci_enumerate() which could trigger issue #3205.
			 */
			return 0;
		}
	}

	if (enum_ctx.has_pci_addr == false) {