Commit e775a7a4 authored by Aleksey Marchuk's avatar Aleksey Marchuk Committed by Jim Harris
Browse files

nvmf/rdma: Process pending events on every poll cycle



Originally we processed pending qpair events such as pending
send/read/write/buffer queues when we reaped a CQE for a given qpair.
Then we found an issue related to the pending buffers handling -
an IO request could be waiting for a buffer and it didn't get any CQE,
we fixed it by handling pending_buf_queue on idle CQ polling.

Last issue (3646) revealed that qpair can have requests in
pending send queue (sending responses) and no further CQEs
are expected for the given qpair, that leads to a stuck.

To fix both issues and prevent any futher related issues,
it was decided to process all pending event on every poll
iteration. Idle qpairs (with queue depth 0) are skipped to
avoid unnecessary checks.

This patch also changes how we process new requests - previously
we put a new req to the head of incoming queue and immediately after
that called qpair_process_pending which pick up the request. Now we can
reap several requests before calling qpair_process_pending and to keep
all requests in order we should put a request to the tail of the list.

First fix for #3646

Change-Id: If92bab6e575b8e6a4c113bf281652f3a26b0c209
Signed-off-by: default avatarAleksey Marchuk <alexeymar@nvidia.com>
Reviewed-on: https://review.spdk.io/c/spdk/spdk/+/26247


Reviewed-by: default avatarKonrad Sztyber <ksztyber@nvidia.com>
Community-CI: Mellanox Build Bot
Tested-by: default avatarSPDK Automated Test System <spdkbot@gmail.com>
Reviewed-by: default avatarJim Harris <jim.harris@nvidia.com>
parent 443a0197
Loading
Loading
Loading
Loading
+14 −19
Original line number Diff line number Diff line
@@ -3436,18 +3436,20 @@ nvmf_rdma_qpair_process_pending(struct spdk_nvmf_rdma_transport *rtransport,
	}
}

static void
nvmf_rdma_poller_process_pending_buf_queue(struct spdk_nvmf_rdma_transport *rtransport,
static inline void
nvmf_rdma_poller_process_pending_qpairs(struct spdk_nvmf_rdma_transport *rtransport,
					struct spdk_nvmf_rdma_poller *rpoller)
{
	struct spdk_nvmf_request *req, *tmp;
	struct spdk_nvmf_rdma_request *rdma_req;
	struct spdk_nvmf_rdma_qpair *rqpair, *tmp;

	STAILQ_FOREACH_SAFE(req, &rpoller->group->group.pending_buf_queue, buf_link, tmp) {
		rdma_req = SPDK_CONTAINEROF(req, struct spdk_nvmf_rdma_request, req);
		if (nvmf_rdma_request_process(rtransport, rdma_req) == false) {
			break;
	/* TODO: Here we iterate all qpairs, active and not active and touch at least 2 cache lines per
	 * qpair. On high scale with small number of active qpairs we may observe higher rate of L2 cache
	 * misses. To solve this problem we need to maintain a dedicated list of active qpairs */
	RB_FOREACH_SAFE(rqpair, qpairs_tree, &rpoller->qpairs, tmp) {
		if (rqpair->qpair.queue_depth == 0) {
			continue;
		}
		nvmf_rdma_qpair_process_pending(rtransport, rqpair, false);
	}
}

@@ -4763,7 +4765,7 @@ nvmf_rdma_poller_poll(struct spdk_nvmf_rdma_transport *rtransport,
			rqpair->current_recv_depth++;
			rdma_recv->receive_tsc = poll_tsc;
			rpoller->stat.requests++;
			STAILQ_INSERT_HEAD(&rqpair->resources->incoming_queue, rdma_recv, link);
			STAILQ_INSERT_TAIL(&rqpair->resources->incoming_queue, rdma_recv, link);
			rqpair->qpair.queue_depth++;
			break;
		case RDMA_WR_TYPE_DATA:
@@ -4830,24 +4832,17 @@ nvmf_rdma_poller_poll(struct spdk_nvmf_rdma_transport *rtransport,
			continue;
		}

		nvmf_rdma_qpair_process_pending(rtransport, rqpair, false);

		if (spdk_unlikely(!spdk_nvmf_qpair_is_active(&rqpair->qpair))) {
			nvmf_rdma_destroy_drained_qpair(rqpair);
		}
	}

	nvmf_rdma_poller_process_pending_qpairs(rtransport, rpoller);

	if (spdk_unlikely(error == true)) {
		return -1;
	}

	if (reaped == 0) {
		/* In some cases we may not receive any CQE but we still may have pending IO requests waiting for
		 * a resource (e.g. a WR from the data_wr_pool).
		 * We need to start processing of such requests if no CQE reaped */
		nvmf_rdma_poller_process_pending_buf_queue(rtransport, rpoller);
	}

	/* submit outstanding work requests. */
	_poller_submit_recvs(rtransport, rpoller);
	_poller_submit_sends(rtransport, rpoller);