Commit 43ad7feb authored by Michael Haeuptle's avatar Michael Haeuptle Committed by Tomasz Zawadzki
Browse files

lib/nvmf: Fixes stuck subsystem RPC



A subsystem RPC is not transitioned to a paused state when there
are ios outstanding (tracked by subsystem poll group).

In general AERs, are not tracked as outstanding IOs. However,
there are 3 paths in nvmf_ctrlr_async_event_request which do not
adjust the outstanding io count.
If we get into any of these 3 paths, the subsystem pause can hang
forever.

The issue was reproduced with hot plug stress testing under load.
We can get into the second path (SPDK_NVME_ASYNC_EVENT_TYPE_NOTICE)
under these circumstances:
- An AER completion is sent to the initiator due to a namespace change
(e.g. hot remove/add)
- In this case, type is set to SPDK_NVME_ASYNC_EVENT_TYPE_NOTICE
- The initiator sends a new AER admin command, hitting the second path
where we return without adjusting the outstanding ios.

Fixes: 1552
Change-Id: I45f781966cc1e9a601b2305c7985a21154d802e8
Signed-off-by: default avatarMichael Haeuptle <michael.haeuptle@hpe.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3854


Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI
Tested-by: default avatarSPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: default avatarSeth Howell <seth.howell@intel.com>
Reviewed-by: default avatarBen Walker <benjamin.walker@intel.com>
Reviewed-by: default avatarJinYu <jin.yu@intel.com>
Reviewed-by: default avatarChangpeng Liu <changpeng.liu@intel.com>
Reviewed-by: default avatarAleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: default avatarShuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
parent 59c1e169
Loading
Loading
Loading
Loading
+5 −5
Original line number Diff line number Diff line
@@ -1578,6 +1578,11 @@ nvmf_ctrlr_async_event_request(struct spdk_nvmf_request *req)

	SPDK_DEBUGLOG(SPDK_LOG_NVMF, "Async Event Request\n");

	/* AER cmd is an exception */
	sgroup = &req->qpair->group->sgroups[ctrlr->subsys->id];
	assert(sgroup != NULL);
	sgroup->io_outstanding--;

	/* Four asynchronous events are supported for now */
	if (ctrlr->nr_aer_reqs >= NVMF_MAX_ASYNC_EVENTS) {
		SPDK_DEBUGLOG(SPDK_LOG_NVMF, "AERL exceeded\n");
@@ -1600,11 +1605,6 @@ nvmf_ctrlr_async_event_request(struct spdk_nvmf_request *req)
		return SPDK_NVMF_REQUEST_EXEC_STATUS_COMPLETE;
	}

	/* AER cmd is an exception */
	sgroup = &req->qpair->group->sgroups[ctrlr->subsys->id];
	assert(sgroup != NULL);
	sgroup->io_outstanding--;

	ctrlr->aer_req[ctrlr->nr_aer_reqs++] = req;
	return SPDK_NVMF_REQUEST_EXEC_STATUS_ASYNCHRONOUS;
}