Skip to content

Commit cbda331

Browse files
committed
ZTS: resilver_restart_001 improvements
The resilver_restart_001 test case has not been entirely reliable when run under the CI. Address several small issues which may be responsible. - Configure the pool as raidz2 instead of raidz1 since the test offlines two devices. This ensures the second device is marked as OFFLINE instead of DEGRADED. - Start the zpool replace after setting SCAN_SUSPEND_PROGRESS to close any potential race where the replace finishs to quickly. - Wait for the offlines/onlined vdevs to fully transition to the expected state during the test. - Add the true flag to sync_pool to force a TXG sync to happen even if it might not otherwise be required. - During cleanup dump the zpool events history to aid debugging if the updated test case is still unreliable in the CI. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
1 parent 1644e2f commit cbda331

1 file changed

Lines changed: 12 additions & 7 deletions

File tree

tests/zfs-tests/tests/functional/replacement/resilver_restart_001.ksh

Lines changed: 12 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -46,18 +46,20 @@
4646

4747
function cleanup
4848
{
49+
log_must zpool events
4950
log_must set_tunable32 RESILVER_MIN_TIME_MS $ORIG_RESILVER_MIN_TIME
5051
log_must set_tunable32 SCAN_SUSPEND_PROGRESS \
5152
$ORIG_SCAN_SUSPEND_PROGRESS
5253
log_must set_tunable32 RESILVER_DEFER_PERCENT \
5354
$ORIG_RESILVER_DEFER_PERCENT
5455
log_must set_tunable32 ZEVENT_LEN_MAX $ORIG_ZFS_ZEVENT_LEN_MAX
5556
log_must zinject -c all
57+
log_must zpool events -c
5658
destroy_pool $TESTPOOL1
5759
rm -f ${VDEV_FILES[@]} $SPARE_VDEV_FILE
5860
}
5961

60-
# count resilver events in zpool and number of deferred rsilvers on vdevs
62+
# count resilver events in zpool and number of deferred resilvers on vdevs
6163
function verify_restarts # <msg> <cnt> <defer>
6264
{
6365
msg=$1
@@ -113,7 +115,7 @@ log_must set_tunable32 ZEVENT_LEN_MAX 512
113115
log_must truncate -s $VDEV_FILE_SIZE ${VDEV_FILES[@]} $SPARE_VDEV_FILE
114116

115117
log_must zpool create -f -o feature@resilver_defer=disabled $TESTPOOL1 \
116-
raidz ${VDEV_FILES[@]}
118+
raidz2 ${VDEV_FILES[@]}
117119

118120
# create 4 filesystems
119121
for fs in fs{0..3}
@@ -157,18 +159,21 @@ do
157159
log_must set_tunable32 RESILVER_MIN_TIME_MS 20
158160

159161
# initiate a resilver and suspend the scan as soon as possible
160-
log_must zpool replace $TESTPOOL1 $VDEV_REPLACE
161162
log_must set_tunable32 SCAN_SUSPEND_PROGRESS 1
163+
log_must zpool replace $TESTPOOL1 $VDEV_REPLACE
162164

163165
# there should only be 1 resilver start
164166
verify_restarts '' "${RESTARTS[0]}" "${VDEVS[0]}"
165167

166168
# offline then online a vdev to introduce a new DTL range after current
167169
# scan, which should restart (or defer) the resilver
168170
log_must zpool offline $TESTPOOL1 ${VDEV_FILES[2]}
169-
sync_pool $TESTPOOL1
171+
log_must wait_vdev_state $TESTPOOL1 ${VDEV_FILES[2]} "OFFLINE"
172+
sync_pool $TESTPOOL1 true
173+
170174
log_must zpool online $TESTPOOL1 ${VDEV_FILES[2]}
171-
sync_pool $TESTPOOL1
175+
log_must wait_vdev_state $TESTPOOL1 ${VDEV_FILES[2]} "ONLINE"
176+
sync_pool $TESTPOOL1 true
172177

173178
# there should now be 2 resilver starts w/o defer, 1 with defer
174179
verify_restarts ' after offline/online' "${RESTARTS[1]}" "${VDEVS[1]}"
@@ -190,8 +195,8 @@ do
190195
log_must is_pool_resilvered $TESTPOOL1
191196

192197
# wait for a few txg's to see if a resilver happens
193-
sync_pool $TESTPOOL1
194-
sync_pool $TESTPOOL1
198+
sync_pool $TESTPOOL1 true
199+
sync_pool $TESTPOOL1 true
195200

196201
# there should now be 2 resilver starts
197202
verify_restarts ' after resilver' "${RESTARTS[3]}" "${VDEVS[3]}"

0 commit comments

Comments
 (0)