+10
−2
Loading
This situation was observed in the bdev_lock_lba_range_check_io() function and occurred in the following scenario: 1. An incoming I/O request is sent to the raid5f bdev. 2. The I/O is split into two child I/Os. 3. The first child I/O is submitted. 4. A range lock is applied to the entire raid5f bdev due to a failure in one of its member devices. 5. The second child I/O is submitted. As a result, it is placed on the io_locked list in the bdev_io_submit() function. 6. Consequently, the locking procedure gets stuck in the bdev_lock_lba_range_check_io() function. The expectation is that no I/Os on the io_submitted list overlap with the locked range. However, the parent I/O is still waiting for the second child I/O to complete. Since the second child I/O is on the io_locked list, it is not processed, preventing the parent I/O from completing. This results in a form of deadlock. Note that this issue can only occur when not all child I/Os are submitted immediately by bdev_io_split(). The unit test simulates a scenario where the pool does not have enough bdev_ios available, so the second child I/O can only be submitted after the first one completes. This could also happen if the split ran out of child iovecs, for example. This second child I/O won't be allowed to proceed because the range is locked and no new I/Os are allowed. But this is not a new I/O - it is a part of an I/O that was submitted before the range was locked and it needs to go through. Otherwise, the parent I/O and the lock won't complete. This commit fixes the issue by preventing a lock on an I/O if it is part of a parent-split I/O. Change-Id: I0445815f0bf8af70f59bea0a4f7649552d4eeed2 Signed-off-by:Daniel Nowak <daniel.nowak@solidigm.com> Signed-off-by:
Artur Paszkiewicz <artur.paszkiewicz@solidigm.com> Reviewed-on: https://review.spdk.io/c/spdk/spdk/+/26149 Tested-by:
SPDK Automated Test System <spdkbot@gmail.com> Community-CI: Mellanox Build Bot Reviewed-by:
Jim Harris <jim.harris@nvidia.com> Reviewed-by:
Ben Walker <ben@nvidia.com>