+5
−5
Loading
A subsystem RPC is not transitioned to a paused state when there are ios outstanding (tracked by subsystem poll group). In general AERs, are not tracked as outstanding IOs. However, there are 3 paths in nvmf_ctrlr_async_event_request which do not adjust the outstanding io count. If we get into any of these 3 paths, the subsystem pause can hang forever. The issue was reproduced with hot plug stress testing under load. We can get into the second path (SPDK_NVME_ASYNC_EVENT_TYPE_NOTICE) under these circumstances: - An AER completion is sent to the initiator due to a namespace change (e.g. hot remove/add) - In this case, type is set to SPDK_NVME_ASYNC_EVENT_TYPE_NOTICE - The initiator sends a new AER admin command, hitting the second path where we return without adjusting the outstanding ios. Fixes: 1552 Change-Id: I45f781966cc1e9a601b2305c7985a21154d802e8 Signed-off-by:Michael Haeuptle <michael.haeuptle@hpe.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3854 Community-CI: Mellanox Build Bot Community-CI: Broadcom CI Tested-by:
SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by:
Seth Howell <seth.howell@intel.com> Reviewed-by:
Ben Walker <benjamin.walker@intel.com> Reviewed-by:
JinYu <jin.yu@intel.com> Reviewed-by:
Changpeng Liu <changpeng.liu@intel.com> Reviewed-by:
Aleksey Marchuk <alexeymar@mellanox.com> Reviewed-by:
Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>