Commit f3febe42 authored by Michal Berger's avatar Michal Berger Committed by Tomasz Zawadzki
Browse files

test/nvme: Make sure sw_hotplug works only with selected nvmes



By default, SPDK's hotplug operates on EVERY nvme ctrl it finds.
This breaks the tgt_run_hotplug() test since the count of available
bdevs, after ctrls are removed from the bus, doesn't match what
we expect to see. Fix this by making sure only selected nvmes
are moved to userspace.

Also, remove the unnecessary calls to bdev_nvme_attach_controller().

Also, also, the timing_cmd() is "fixed" to make sure it returns exit
status of the command it's reporting the time execution for. It
was accidentally masking status of the remove_attach_helper() due
to the way it was being called. It's done here since with it being
"fixed" separately, the sw_hotplug would simply start failing.

In short, the problem was here:

  $ foo=$(timing_cmd bar) <- this executes in a subshell, errexit
                             is not inherited
  # timing_cmd() executes bar and ignores its exit status, simply
  # returns its time of execution. Due to that, entire execution
  # of remove_attach_helper() down the stack is not controlled by
  # errexit.

The above scenario was organically fixed by 7a8d3990, but since
it also exposed issue with the underlying sw_hotplug test it
affected the entire per-patch, together with 1826c4dc which
increased number of sw_hotplug instances.

Next patch in the series brings back the 7a8d3990, but the "fix"
for timing_cmd() should remain as a root cause of what was causing
false positives in the whole sw_hotplug suite in a first place.

Change-Id: Iaed5b1080c547a86e262a895b5d2ccf10691e7bc
Signed-off-by: default avatarMichal Berger <michal.berger@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/23170


Tested-by: default avatarSPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: default avatarKonrad Sztyber <konrad.sztyber@intel.com>
Reviewed-by: default avatarJim Harris <jim.harris@samsung.com>
parent d8804f6b
Loading
Loading
Loading
Loading
+4 −2
Original line number Diff line number Diff line
@@ -702,6 +702,7 @@ function timing_cmd() (
	# The use-case here is this: ts=$(timing_cmd echo bar). Since stdout is always redirected
	# to a pipe handling the $(), lookup the stdin's device and determine if it's sane to send
	# cmd's output to it. If not, just null it.
	local cmd_es=$?

	[[ -t 0 ]] && exec {cmd_out}>&0 || exec {cmd_out}> /dev/null

@@ -711,9 +712,10 @@ function timing_cmd() (
	# catch only output from the time builtin - output from the actual cmd would be still visible,
	# but $() will return just the time's data, hence making it possible to just do:
	#  time_of_super_verbose_cmd=$(timing_cmd super_verbose_cmd)
	time=$({ time "$@" >&"$cmd_out" 2>&1; } 2>&1)

	time=$({ time "$@" >&"$cmd_out" 2>&1; } 2>&1) || cmd_es=$?
	echo "$time"

	return "$cmd_es"
)

function timing_enter() {
+24 −8
Original line number Diff line number Diff line
@@ -8,6 +8,11 @@ rootdir=$(readlink -f $testdir/../..)
source $rootdir/scripts/common.sh
source $rootdir/test/common/autotest_common.sh

bdev_bdfs() {
	jq -r '.[].driver_specific.nvme[].pci_address' \
		<(rpc_cmd bdev_get_bdevs) | sort -u
}

# Pci bus hotplug
# Helper function to remove/attach cotrollers
debug_remove_attach_helper() {
@@ -37,8 +42,15 @@ remove_attach_helper() {

		if "$use_bdev"; then
			# Since we removed all the devices, when the sleep settles, we expect to find no bdevs
			sleep "$hotplug_wait" && (($(rpc_cmd bdev_get_bdevs | jq 'length') == 0))
		fi || return 1
			# FIXME: For some unknown reason, SPDK may stay behind, still returning bdevs on the
			# list which are not on the bus anymore. This happens until nvme_pcie_qpair_abort_trackers()
			# finally returns (usually reporting an error while aborting outstanding commands).
			# It's been noticed that it takes significant amount of time especially under ubuntu2004
			# in the CI.
			while bdfs=($(bdev_bdfs)) && ((${#bdfs[@]} > 0)) && sleep 0.5; do
				printf 'Still waiting for %s to be gone\n' "${bdfs[@]}" >&2
			done
		fi

		# Avoid setup.sh as it does some extra work which is not relevant for this test.
		echo 1 > "/sys/bus/pci/rescan"
@@ -55,7 +67,7 @@ remove_attach_helper() {

		if "$use_bdev"; then
			# See if we get all the bdevs back in one bulk
			bdfs=($(rpc_cmd bdev_get_bdevs | jq -r '.[].driver_specific.nvme[].pci_address' | sort))
			bdfs=($(bdev_bdfs))
			[[ ${bdfs[*]} == "${nvmes[*]}" ]]
		fi
	done
@@ -100,11 +112,6 @@ tgt_run_hotplug() {
	trap 'killprocess ${spdk_tgt_pid}; echo 1 > /sys/bus/pci/rescan; exit 1' SIGINT SIGTERM EXIT
	waitforlisten $spdk_tgt_pid

	for dev in "${!nvmes[@]}"; do
		rpc_cmd bdev_nvme_attach_controller -b "Nvme0$dev" -t PCIe -a "${nvmes[dev]}"
		waitforbdev "Nvme0${dev}n1" "$hotplug_wait"
	done

	rpc_cmd bdev_nvme_set_hotplug -e

	debug_remove_attach_helper "$hotplug_events" "$hotplug_wait" true
@@ -127,6 +134,12 @@ nvmes=($(nvme_in_userspace))
nvme_count=$((${#nvmes[@]} > 2 ? 2 : ${#nvmes[@]}))
nvmes=("${nvmes[@]::nvme_count}")

# Let's dance! \o\ \o/ /o/ \o/
"$rootdir/scripts/setup.sh" reset
# Put on your red shoes ...
PCI_ALLOWED="${nvmes[*]}" "$rootdir/scripts/setup.sh"
# Let's sway! \o\ \o/ /o/ \o/

xtrace_disable
cache_pci_bus
xtrace_restore
@@ -136,3 +149,6 @@ run_hotplug

# Run SPDK target based hotplug
tgt_run_hotplug

# Under the moonlight, this serious moonlight! \o/
"$rootdir/scripts/setup.sh"