Commit e0ab59f1 authored by Alex Michon's avatar Alex Michon Committed by Konrad Sztyber
Browse files

lib/nvme: Fail ctrlr reconnection attempt early if ctrlr is failed



During a reset of a fabrics controller, we disconnect the controller and
then we attempt to reconnect.
The disconnection will unset the `is_failed` flag in the controller and
the adminq is disconnected.
If something bad happens in the meantime, the controller may be marked
as `is_failed` again.
Then when the reconnection start, we reset the adminq's state to
CONNECTING and we start polling on it until we get a response to our
connection request. But since the controller is marked as is_failed and
the adminq's state is CONNECTING, the polling will not do anything (cf
`spdk_nvme_qpair_process_completions()`).
Moreover, the controller is in a WAIT_FOR_CONNECT_ADMINQ state with an
infinite timeout. So the controller may be blocked forever.
Let's try to prevent this situation by checking the `is_failed` flag
before attempting a reconnection.

Change-Id: Id83ff161e0b389fa2e266468006f619ad6bc65c1
Signed-off-by: default avatarAlex Michon <amichon@kalrayinc.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/24649


Tested-by: default avatarSPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: default avatarJim Harris <jim.harris@samsung.com>
Community-CI: Mellanox Build Bot
Reviewed-by: default avatarShuhei Matsumoto <smatsumoto@nvidia.com>
parent ffa1a21e
Loading
Loading
Loading
Loading
+5 −0
Original line number Diff line number Diff line
@@ -3982,6 +3982,11 @@ nvme_ctrlr_process_init(struct spdk_nvme_ctrlr *ctrlr)

		switch (nvme_qpair_get_state(ctrlr->adminq)) {
		case NVME_QPAIR_CONNECTING:
			if (ctrlr->is_failed) {
				nvme_transport_ctrlr_disconnect_qpair(ctrlr, ctrlr->adminq);
				break;
			}

			break;
		case NVME_QPAIR_CONNECTED:
			nvme_qpair_set_state(ctrlr->adminq, NVME_QPAIR_ENABLED);