Commit 0383e688 authored by Alex Michon's avatar Alex Michon Committed by Konrad Sztyber
Browse files

bdev/nvme: Fix race between reset and qpair creation/deletion



We have the following race condition:
1) A reset is initiated. We iterate over all IO channels to destroy the
   qpairs.
2) A new IO channel is created. We create a nvme qpair.
3) The reset process continues. It iterates over all IO channels to
   recreate the nvme qpair. `reset_iter` is set on the IO channel
   created at step 2. (Note that we won't recreate a qpair for the IO
   channel created at step 2).
4) The IO channel created at step 2 gets deleted.
   `bdev_nvme_destroy_ctrlr_channel_cb` is called. We skip the qpair
   disconnection because `reset_iter` is set.
In the end, the qpair is never disconnected.

Ensure that we always disconnect qpairs, even if a reset is in progress.

Change-Id: I48af99ed582ebfdcaf2a98a92e9077c048bc7c54
Signed-off-by: default avatarAlex Michon <amichon@kalrayinc.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/25430


Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Reviewed-by: default avatarJim Harris <jim.harris@nvidia.com>
Community-CI: Community CI Samsung <spdk.community.ci.samsung@gmail.com>
Reviewed-by: default avatarShuhei Matsumoto <smatsumoto@nvidia.com>
Community-CI: Mellanox Build Bot
Tested-by: default avatarSPDK CI Jenkins <sys_sgci@intel.com>
parent a5dab6cf
Loading
Loading
Loading
Loading
+6 −5
Original line number Diff line number Diff line
@@ -3642,12 +3642,13 @@ bdev_nvme_destroy_ctrlr_channel_cb(void *io_device, void *ctx_buf)
	_bdev_nvme_clear_io_path_cache(nvme_qpair);

	if (nvme_qpair->qpair != NULL) {
		if (ctrlr_ch->reset_iter == NULL) {
		/* Always try to disconnect the qpair, even if a reset is in progress.
		 * The qpair may have been created after the reset process started.
		 */
		spdk_nvme_ctrlr_disconnect_io_qpair(nvme_qpair->qpair);
		} else {
		if (ctrlr_ch->reset_iter) {
			/* Skip current ctrlr_channel in a full reset sequence because
			 * it is being deleted now. The qpair is already being disconnected.
			 * We do not have to restart disconnecting it.
			 * it is being deleted now.
			 */
			nvme_ctrlr_for_each_channel_continue(ctrlr_ch->reset_iter, 0);
		}