Commit 52ecd5ce authored by Vineet Madan's avatar Vineet Madan Committed by Tomasz Zawadzki
Browse files

bdev/nvme: fix race between controller destruction and reset



We have the following race condition -
1. A controller reset is in progress (iterating through channels).
2. The controller is being removed/destroyed at the same time.
3. The iterator visits a channel that has a connect_poller active.
4. The channel is destroyed (in 'bdev_nvme_destroy_ctrlr_channel_cb'),
   but the connect_poller is left running.
5. The connect_poller fires after the qpair has been freed, thus hitting
   the assertion failure `qpair != NULL`.

Fix:
Unregister the 'connect_poller' when disconnecting the qpair in
'bdev_nvme_destroy_ctrlr_channel_cb'.

Change-Id: I503cc5e56dae35bd30e68b4867c900d1a0bf5a89
Signed-off-by: default avatarVineet Madan <vineet.madan@nutanix.com>
Reviewed-on: https://review.spdk.io/c/spdk/spdk/+/26821


Reviewed-by: default avatarJacek Kalwas <jacek.kalwas@nutanix.com>
Community-CI: Mellanox Build Bot
Tested-by: default avatarSPDK Automated Test System <spdkbot@gmail.com>
Reviewed-by: default avatarAleksey Marchuk <alexeymar@nvidia.com>
Reviewed-by: default avatarShuhei Matsumoto <smatsumoto@nvidia.com>
parent b6883829
Loading
Loading
Loading
Loading
+6 −0
Original line number Diff line number Diff line
@@ -3777,6 +3777,12 @@ bdev_nvme_destroy_ctrlr_channel_cb(void *io_device, void *ctx_buf)
		 * The qpair may have been created after the reset process started.
		 */
		spdk_nvme_ctrlr_disconnect_io_qpair(nvme_qpair->qpair);

		/* Since the channel is being destroyed, unregister any connect_poller
		 * that might be active for the channel.
		 */
		spdk_poller_unregister(&ctrlr_ch->connect_poller);

		if (ctrlr_ch->reset_iter) {
			/* Skip current ctrlr_channel in a full reset sequence because
			 * it is being deleted now.