Commit ff0a7dfc authored by yidong0635's avatar yidong0635 Committed by Changpeng Liu
Browse files

nvme: Handle CQ polling failures by marking the controller as failed.



nvme_transport_qpair_process_completions calls nvme_rdma_qpair_process_completions
There are some cases return -1 due to failure of "CQ errors".

Handle CQ polling failures by marking the controller as failed.
That a completion with an error will be treated as controller failed.
Requests will be aborted after retry counter exceeded. Otherwise, code will keep on
reporting errors without recovery.

This is to fix issue #850.

Change-Id: I0b324232310e107bf7fd5722aca54d402a19b14d
Signed-off-by: default avataryidong0635 <dongx.yi@intel.com>
Reviewed-on: https://review.gerrithub.io/c/spdk/spdk/+/460569


Tested-by: default avatarSPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: default avatarBen Walker <benjamin.walker@intel.com>
Reviewed-by: default avatarShuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: default avatarChangpeng Liu <changpeng.liu@intel.com>
parent 16fdf466
Loading
Loading
Loading
Loading
+4 −0
Original line number Diff line number Diff line
@@ -449,6 +449,10 @@ spdk_nvme_qpair_process_completions(struct spdk_nvme_qpair *qpair, uint32_t max_

	qpair->in_completion_context = 1;
	ret = nvme_transport_qpair_process_completions(qpair, max_completions);
	if (ret < 0) {
		SPDK_ERRLOG("CQ error, abort requests after transport retry counter exceeded\n");
		qpair->ctrlr->is_failed = true;
	}
	qpair->in_completion_context = 0;
	if (qpair->delete_after_completion_context) {
		/*