Commit 6d9d3f87 authored by Jim Harris's avatar Jim Harris Committed by Tomasz Zawadzki
Browse files

bdev_nvme: don't try to read csts on fabrics cmd timeout



It is recommended to read CSTS when there is a timeout.
If CSTS.CFS (Controller Fatal Status) is set, we should
reset the controller.

But if an admin command on a fabrics controller times
out, reading CSTS submits another fabrics command that
could also timeout.  Even worse, we are recursively
polling the admin queue for completions in this case.

Fixes issue #1716.

Signed-off-by: default avatarJim Harris <james.r.harris@intel.com>
Change-Id: I23d31f6302375c52eba6f4370748d622fbd25ca7
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/5513


Tested-by: default avatarSPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: default avatarChangpeng Liu <changpeng.liu@intel.com>
Reviewed-by: default avatarShuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: default avatarAleksey Marchuk <alexeymar@mellanox.com>
parent 037f6bda
Loading
Loading
Loading
Loading
+12 −5
Original line number Diff line number Diff line
@@ -1231,12 +1231,19 @@ timeout_cb(void *cb_arg, struct spdk_nvme_ctrlr *ctrlr,

	SPDK_WARNLOG("Warning: Detected a timeout. ctrlr=%p qpair=%p cid=%u\n", ctrlr, qpair, cid);

	/* Only try to read CSTS if it's a PCIe controller or we have a timeout on an I/O
	 * queue.  (Note: qpair == NULL when there's an admin cmd timeout.)  Otherwise we
	 * would submit another fabrics cmd on the admin queue to read CSTS and check for its
	 * completion recursively.
	 */
	if (nvme_bdev_ctrlr->connected_trid->trtype == SPDK_NVME_TRANSPORT_PCIE || qpair != NULL) {
		csts = spdk_nvme_ctrlr_get_regs_csts(ctrlr);
		if (csts.bits.cfs) {
			SPDK_ERRLOG("Controller Fatal Status, reset required\n");
			_bdev_nvme_reset(nvme_bdev_ctrlr, NULL);
			return;
		}
	}

	switch (g_opts.action_on_timeout) {
	case SPDK_BDEV_NVME_TIMEOUT_ACTION_ABORT: