Commit 87ebcb08 authored by Evgeniy Kochetov's avatar Evgeniy Kochetov Committed by Ben Walker
Browse files

nvmf/rdma: Handle completions for destroyed QP associated with SRQ



IB Architecture Specification vol.1 rel.13. in ch.10.3.1 "QUEUE PAIR
AND EE CONTEXT STATES" suggests the following destroy procedure for
QPs associated with SRQ:
- Put the QP in the Error State;
- wait for the Affiliated Asynchronous Last WQE Reached Event;
- either:
  * drain the CQ by invoking the Poll CQ verb and either wait for CQ
    to be empty or the number of Poll CQ operations has exceeded CQ
    capacity size; or
  * post another WR that completes on the same CQ and wait for this WR
    to return as a WC;
- and then invoke a Destroy QP or Reset QP.

Without the drain step it is possible that LAST_WQE_REACHED event is
received and QP is destroyed before the last receive WR completion is
polled from the CQ.

In SPDK there is no risk of resource leakage in this case. So, instead
of draining we can destroy QP and then just ignore receive completions
without QP and post receive WRs back to SRQ.

Fixes #903

Signed-off-by: default avatarEvgeniy Kochetov <evgeniik@mellanox.com>
Signed-off-by: default avatarSasha Kotchubievsky <sashakot@mellanox.com>
Signed-off-by: default avatarAlexey Marchuk <alexeymar@mellanox.com>
Change-Id: Ice6d3d5afc205c489f768e3b51c6cda8809bee9a
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/465747


Reviewed-by: default avatarSeth Howell <seth.howell@intel.com>
Reviewed-by: default avatarBen Walker <benjamin.walker@intel.com>
Reviewed-by: default avatarShuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Tested-by: default avatarSPDK CI Jenkins <sys_sgci@intel.com>
parent 660b7fae
Loading
Loading
Loading
Loading
+17 −0
Original line number Diff line number Diff line
@@ -3406,6 +3406,23 @@ spdk_nvmf_rdma_poller_poll(struct spdk_nvmf_rdma_transport *rtransport,
			rdma_recv = SPDK_CONTAINEROF(rdma_wr, struct spdk_nvmf_rdma_recv, rdma_wr);
			if (rpoller->srq != NULL) {
				rdma_recv->qpair = get_rdma_qpair_from_wc(rpoller, &wc[i]);
				/* It is possible that there are still some completions for destroyed QP
				 * associated with SRQ. We just ignore these late completions and re-post
				 * receive WRs back to SRQ.
				 */
				if (spdk_unlikely(NULL == rdma_recv->qpair)) {
					struct ibv_recv_wr *bad_wr;
					int rc;

					rdma_recv->wr.next = NULL;
					rc = ibv_post_srq_recv(rpoller->srq,
							       &rdma_recv->wr,
							       &bad_wr);
					if (rc) {
						SPDK_ERRLOG("Failed to re-post recv WR to SRQ, err %d\n", rc);
					}
					continue;
				}
			}
			rqpair = rdma_recv->qpair;