+66
−84
Loading
There is a race between reset start and complete: 1. reset_1 is completing. It clears bdev->internal.reset_in_progress and sends unfreeze_channel messages to remove queued resets of all channels. 2. reset_2 is starting. As bdev->internal.reset_in_progress has been cleared, it is inserted to queued_resets list and starts to freeze channels. 3. reset_1's unfreeze_channel message removes reset_2 from queued_resets list. 4. reset_2 finishes freezing channels, but the corresponding bdev_io has gone, hence resulting in segmentation fault. To fix this, we use per-bdev queued_resets list instead of per-channel ones, and nullify bdev->internal.reset_in_progress after unfreezing bdev channels. In this way, we can assure that all resets submitted during an in-progress reset can be queued and completed correctly. Besides, we do not insert the reset that is submitted to the underlying device into the queued_resets list, so that the list can be processsed cleanly. Change-Id: I7cb14d790c1e20cea86e4829555d04acc408ee28 Signed-off-by:Jinlong Chen <chenjinlong.cjl@alibaba-inc.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/25371 Tested-by:
SPDK CI Jenkins <sys_sgci@intel.com> Reviewed-by:
Shuhei Matsumoto <smatsumoto@nvidia.com> Reviewed-by:
Konrad Sztyber <konrad.sztyber@intel.com> Community-CI: Community CI Samsung <spdk.community.ci.samsung@gmail.com> Community-CI: Mellanox Build Bot Reviewed-by:
Aleksey Marchuk <alexeymar@nvidia.com> Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com> Reviewed-by:
GangCao <gang.cao@intel.com>