Commit 4985289a authored by Alexey Marchuk's avatar Alexey Marchuk Committed by Tomasz Zawadzki
Browse files

nvme/rdma: Lock mutex when destroying lingering qpair



Handling of lingering qpair destruction is different from
regular destroy/disconnect path, controller's mutex is
not locked in that case. That can lead to a race condition
when nvme_rdma_qpair_destroy iterates controller's outstanding
rdma_cm events and acks ones belonging to the qpair while
another thread might be polling rdma cm events and reap an
events for the qpair being destroyed. In that case we attempt
to destroy rdma_cm id which has unprocessed events and
rdma_destroy_id stucks.

Fixes issue #3347

Signed-off-by: default avatarAlexey Marchuk <alexeymar@nvidia.com>
Change-Id: I3470c6080e2c19a63eb65eecc398dccd92327eb9
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/23324


Reviewed-by: default avatarShuhei Matsumoto <smatsumoto@nvidia.com>
Reviewed-by: default avatarBen Walker <ben@nvidia.com>
Tested-by: default avatarSPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
parent 956fd5e1
Loading
Loading
Loading
Loading
+9 −1
Original line number Diff line number Diff line
@@ -1963,6 +1963,9 @@ quiet:
static int
nvme_rdma_qpair_wait_until_quiet(struct nvme_rdma_qpair *rqpair)
{
	struct spdk_nvme_qpair *qpair = &rqpair->qpair;
	struct spdk_nvme_ctrlr *ctrlr = qpair->ctrlr;

	if (spdk_get_ticks() < rqpair->evt_timeout_ticks &&
	    (rqpair->current_num_sends != 0 ||
	     (!rqpair->srq && rqpair->rsps->current_num_recvs != 0))) {
@@ -1970,9 +1973,14 @@ nvme_rdma_qpair_wait_until_quiet(struct nvme_rdma_qpair *rqpair)
	}

	rqpair->state = NVME_RDMA_QPAIR_STATE_EXITED;

	nvme_rdma_qpair_abort_reqs(&rqpair->qpair, 0);
	if (!nvme_qpair_is_admin_queue(qpair)) {
		nvme_robust_mutex_lock(&ctrlr->ctrlr_lock);
	}
	nvme_rdma_qpair_destroy(rqpair);
	if (!nvme_qpair_is_admin_queue(qpair)) {
		nvme_robust_mutex_unlock(&ctrlr->ctrlr_lock);
	}
	nvme_transport_ctrlr_disconnect_qpair_done(&rqpair->qpair);

	return 0;