Skip to content

ZTS: resilver_restart_001 improvements#18434

Merged
behlendorf merged 1 commit intoopenzfs:masterfrom
behlendorf:zts-resilver_restart_001
Apr 16, 2026
Merged

ZTS: resilver_restart_001 improvements#18434
behlendorf merged 1 commit intoopenzfs:masterfrom
behlendorf:zts-resilver_restart_001

Conversation

@behlendorf
Copy link
Copy Markdown
Contributor

Motivation and Context

Resolve the occasional CI failures for this test.

https://github.com/openzfs/zfs/actions/runs/24217052638/job/70700070007?pr=18387

Description

The resilver_restart_001 test case has not been entirely reliable when run under the CI. Address several small issues which may be responsible.

  • Configure the pool as raidz2 instead of raidz1 since the test offlines two devices. This ensures the second device is marked as OFFLINE instead of DEGRADED.

  • Start the zpool replace after setting SCAN_SUSPEND_PROGRESS to close any potential race where the replace finishs to quickly.

  • Wait for the offlines/onlined vdevs to fully transition to the expected state during the test.

  • Add the true flag to sync_pool to force a TXG sync to happen even if it might not otherwise be required.

  • During cleanup dump the zpool events history to aid debugging if the updated test case is still unreliable in the CI.

How Has This Been Tested?

Tested locally, but will be verified by the CI.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Quality assurance (non-breaking change which makes the code more robust against bugs)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Apr 15, 2026
@github-actions github-actions Bot added the Status: Work in Progress Not yet ready for general review label Apr 15, 2026
@behlendorf behlendorf marked this pull request as ready for review April 16, 2026 02:06
Copilot AI review requested due to automatic review settings April 16, 2026 02:06
@behlendorf behlendorf force-pushed the zts-resilver_restart_001 branch from 04d8bff to b97858d Compare April 16, 2026 02:07
@github-actions github-actions Bot removed the Status: Work in Progress Not yet ready for general review label Apr 16, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves reliability of the resilver_restart_001 ZTS test by reducing races and making vdev state transitions deterministic, aiming to eliminate intermittent CI failures.

Changes:

  • Switch test pool topology to raidz2 to better tolerate multiple device disruptions during the test.
  • Reduce replace/suspend timing races by setting SCAN_SUSPEND_PROGRESS before starting zpool replace.
  • Add explicit waits for vdev state transitions and force TXG syncs (sync_pool ... true) to stabilize sequencing; dump zpool events during cleanup for debugging.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/zfs-tests/tests/functional/replacement/resilver_restart_001.ksh Outdated
The resilver_restart_001 test case has not been entirely reliable
when run under the CI.  Address several small issues which may be
responsible.

- Configure the pool as raidz2 instead of raidz1 since the test
  offlines two devices.  This ensures the second device is marked
  as OFFLINE instead of DEGRADED.

- Start the zpool replace after setting SCAN_SUSPEND_PROGRESS to
  close any potential race where the replace finishs to quickly.

- Wait for the offlines/onlined vdevs to fully transition to the
  expected state during the test.

- Add the true flag to sync_pool to force a TXG sync to happen
  even if it might not otherwise be required.

- During cleanup dump the zpool events history to aid debugging
  if the updated test case is still unreliable in the CI.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
@behlendorf behlendorf force-pushed the zts-resilver_restart_001 branch from b97858d to cbda331 Compare April 16, 2026 16:20
@behlendorf behlendorf requested a review from tonyhutter April 16, 2026 18:02
@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Apr 16, 2026
@behlendorf behlendorf merged commit b32911b into openzfs:master Apr 16, 2026
34 of 45 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Status: Accepted Ready to integrate (reviewed, tested)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants