+26
−1
Loading
When an NVMe controller faults (fast-io-fail + reconnect), NVMe retry I/Os may complete FAILED (propagated as ENXIO). RAID then schedules base-bdev removal. During removal we quiesce the RAID bdev and the bdev layer starts tearing down core channels. Later, while destroying module channels, the NVMe module finishes aborting remaining retry I/Os; those completions re-enter the RAID path and spdk_bdev_free_io() dereferences a core channel that was already freed, causing a use-after-free. Fix: #3703 after the RAID bdev is quiesced, issue spdk_bdev_reset() to the base bdev so outstanding I/Os are drained/aborted before core-channel resources are released. Continue the existing removal flow only after reset completion. Change-Id: Ifa715c06ceecc8eed709e3049484eaf50663d9a0 Signed-off-by:jinhong.kim0 <jinhong.kim0@navercorp.com> Reviewed-on: https://review.spdk.io/c/spdk/spdk/+/26443 Community-CI: Mellanox Build Bot Reviewed-by:
Shuhei Matsumoto <smatsumoto@nvidia.com> Reviewed-by:
Artur Paszkiewicz <artur.paszkiewicz@solidigm.com> Tested-by:
SPDK Automated Test System <spdkbot@gmail.com> Reviewed-by:
Jim Harris <jim.harris@nvidia.com>