+9
−2
Loading
There is a deadlock condition that happens when hot-removing an SR-IOV enabled SSD with the bdev/nvme hotplug poller enabled. Thread A is handling the remove event in the kernel. This could be userspace writing '1' to the pci sysfs 'remove' entry. It holds the removed device's lock, writes to the vfio notifier eventfd, and waits for its completion. DPDK receives the notification on the interrupt thread, and passes it along to SPDK. SPDK just marks its version of the device's struct as having been removed. Then the SPDK bdev nvme hotplug poller runs, calling spdk_nvme_probe_async() which calls into nvme_pcie_ctrlr_scan(). nvme_pcie_ctrlr_scan() first checks for events from the SPDK pci layer, and gets the removal notification. But it does not immediate detach from the PCIe device, it calls the upper layer's remove_cb which will allow the upper layer to do any necessary cleanup and detach from the controller itself. This means we do not respond to the vfio removal request yet, and that the device lock is still held by thread A. But then we do an spdk_pci_enumerate(), which triggers DPDK to do a full sysfs scan of all of the PCI devices. It will read the sriov_numvfs sysfs entry if it exists (which it does for an SR-IOV SSD), but the sriov_numvfs_show() function in the kernel tries to acquire the device_lock() and has to wait. But this thread is also the one that will eventually process the remove_cb and detach the controller, which it cannot do now, so thread A will remain waiting for the completion notification and never release the device_lock(). So workaround for now is to skip the spdk_pci_enumerate() if we have detected a removal event. This gives us at least one bdev nvme hotplug poller period to get the controller detached and release the vfio handle before trying to spdk_pci_enumerate() again. Fixes issue #3205 Change-Id: I1ccf7e0077479141e3d73efc629fa565428bf219 Signed-off-by:Jim Harris <jim.harris@samsung.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/20987 Reviewed-by:
Michal Berger <michal.berger@intel.com> Community-CI: Mellanox Build Bot Reviewed-by:
Tomasz Zawadzki <tomasz.zawadzki@intel.com> Reviewed-by:
Michael Haeuptle <michaelhaeuptle@gmail.com> Reviewed-by:
Konrad Sztyber <konrad.sztyber@intel.com> Tested-by:
SPDK CI Jenkins <sys_sgci@intel.com>