Commit ef2b6d5f authored by Jim Harris's avatar Jim Harris
Browse files

env_dpdk: don't touch dpdk device handle after DPDK removal



If we get back-to-back removals, the first removal could be pending
removal in the SPDK device list. When callback happens for the second
removal, we can't touch the rte dev_handle for that first device,
because DPDK has already freed it. So use the removed flag
on our spdk_pci_device structure to skip it and avoid dereferencing
freed memory.

While here, remove debug prints that were added specifically to
triage this issue.

Fixes issue #3599.

Signed-off-by: default avatarJim Harris <jim.harris@nvidia.com>
Change-Id: I43f81145470f190a51c235ab512c1b9f419ba29e
Reviewed-on: https://review.spdk.io/c/spdk/spdk/+/25586


Reviewed-by: default avatarChangpeng Liu <changpeliu@tencent.com>
Tested-by: default avatarSPDK Automated Test System <spdkbot@gmail.com>
Reviewed-by: default avatarMichael Haeuptle <michaelhaeuptle@gmail.com>
Reviewed-by: default avatarAleksey Marchuk <alexeymar@nvidia.com>
Community-CI: Mellanox Build Bot
parent 9f051b98
Loading
Loading
Loading
Loading
+8 −6
Original line number Diff line number Diff line
@@ -226,15 +226,17 @@ pci_device_rte_dev_event(const char *device_name,
		TAILQ_FOREACH(dev, &g_pci_devices, internal.tailq) {
			struct rte_pci_device *rte_dev = dev->dev_handle;

			if (strcmp(dpdk_pci_device_get_name(rte_dev), device_name)) {
			if (dev->internal.removed) {
				/* DPDK already removed this device, we are still pending
				 * removal of the device from the SPDK device list. Since
				 * DPDK freed the device handle, we must not try to
				 * get its device name.
				 */
				continue;
			}

			/* Note: these ERRLOGs are useful for triaging issue #2983. */
			if (dev->internal.pending_removal || dev->internal.removed) {
				SPDK_ERRLOG("Received event for device SPDK already tried to remove\n");
				SPDK_ERRLOG("pending_removal=%d removed=%d\n", dev->internal.pending_removal,
					    dev->internal.removed);
			if (strcmp(dpdk_pci_device_get_name(rte_dev), device_name)) {
				continue;
			}

			if (!dev->internal.pending_removal) {