Commit f05ee5e8 authored by Jacek Kalwas's avatar Jacek Kalwas Committed by Tomasz Zawadzki
Browse files

bdev/nvme: register ctrlr early to avoid spdk_for_each_channel error



With the previous code, it was possible to sporadically hit:
  spdk_for_each_channel: *ERROR*: could not find io_device

This happened because nvme_ctrlr might not be registered as an
io_device but provided as an argument to spdk_for_each_channel.

See the negative path handling of the admin qpair, e.g.:
  bdev_nvme_poll_adminq ->
   bdev_nvme_failover_ctrlr ->
   _bdev_nvme_reset_ctrlr ->
   bdev_nvme_reset_destroy_qpairs ->
   nvme_ctrlr_for_each_channel
or
 bdev_nvme_poll_adminq ->
   bdev_nvme_clear_io_path_caches ->
   nvme_ctrlr_for_each_channel

Or JSON-RPC methods:
  bdev_nvme_remove_error_injection ->
    nvme_ctrlr_for_each_channel
or
  bdev_nvme_add_error_injection ->
    nvme_ctrlr_for_each_channel

With early registration, these flows simply iterate through 0 channels
and return success.

bdev_nvme_poll_adminq registration cannot be delayed because of
outstanding log page command.

Introduction of an additional flag (registered) and conditional
logic within nvme_ctrlr_for_each_channel should also work.

Fixes #3736

Change-Id: Icbc4fdbcf3a1505d34dd8ddb566992a74ba7c03e
Signed-off-by: default avatarJacek Kalwas <jacek.kalwas@nutanix.com>
Reviewed-on: https://review.spdk.io/c/spdk/spdk/+/26926


Tested-by: default avatarSPDK Automated Test System <spdkbot@gmail.com>
Reviewed-by: default avatarTomasz Zawadzki <tomasz@tzawadzki.com>
Reviewed-by: default avatarShuhei Matsumoto <smatsumoto@nvidia.com>
Community-CI: Mellanox Build Bot
Reviewed-by: default avatarPierre Lestringant <plestringant@kalrayinc.com>
parent dd0d1a4c
Loading
Loading
Loading
Loading
+18 −12
Original line number Diff line number Diff line
@@ -5797,12 +5797,6 @@ nvme_ctrlr_create_done(struct nvme_ctrlr *nvme_ctrlr,
{
	NVME_CTRLR_INFOLOG(nvme_ctrlr, "ctrlr was created\n");

	spdk_io_device_register(nvme_ctrlr,
				bdev_nvme_create_ctrlr_channel_cb,
				bdev_nvme_destroy_ctrlr_channel_cb,
				sizeof(struct nvme_ctrlr_channel),
				nvme_ctrlr->nbdev_ctrlr->name);

	/* AER callback is registered late to prevent getting the I/O channel
	 * on an unregistered controller during namespace population. */
	spdk_nvme_ctrlr_register_aer_callback(nvme_ctrlr->ctrlr, nvme_ctrlr_aer_cb, nvme_ctrlr);
@@ -5825,7 +5819,7 @@ nvme_ctrlr_init_ana_log_page_done(void *_ctx, const struct spdk_nvme_cpl *cpl)
	nvme_ctrlr->probe_ctx = NULL;

	if (spdk_nvme_cpl_is_error(cpl)) {
		nvme_ctrlr_delete(nvme_ctrlr);
		bdev_nvme_delete_ctrlr(nvme_ctrlr, false);

		if (ctx != NULL) {
			ctx->reported_bdevs = 0;
@@ -6127,17 +6121,29 @@ nvme_ctrlr_create(struct spdk_nvme_ctrlr *ctrlr,
	}

	cdata = spdk_nvme_ctrlr_get_data(ctrlr);

	if (cdata->cmic.ana_reporting) {
		rc = nvme_ctrlr_init_ana_log_page(nvme_ctrlr, ctx);
		if (rc == 0) {
			return 0;
		if (rc != 0) {
			goto err;
		}
	} else {
	}

	/* Register the I/O device early because, on the negative path handling of the admin qpair, many
	 * flows iterate through nvme_ctrlr channels, and the same applies to some JSON-RPC methods. If
	 * the device is not registered, this triggers assertions and returns a negative status. With
	 * early registration, these flows simply iterate through zero channels and return success. */
	spdk_io_device_register(nvme_ctrlr,
				bdev_nvme_create_ctrlr_channel_cb,
				bdev_nvme_destroy_ctrlr_channel_cb,
				sizeof(struct nvme_ctrlr_channel),
				nvme_ctrlr->nbdev_ctrlr->name);

	if (!cdata->cmic.ana_reporting) {
		nvme_ctrlr_create_done(nvme_ctrlr, ctx);
		return 0;
	}

	return 0;

err:
	nvme_ctrlr_delete(nvme_ctrlr);
	return rc;