chore(infra): extend disk-guard to cover bind-mount target roots#1026
Open
chore(infra): extend disk-guard to cover bind-mount target roots#1026
Conversation
The disk-guard added in #1001 walked only /home/noah/data/actions-runner*/_work/*/target/ — runner-workspace target dirs totalling ~75G across 8 runners. The actual runner-disk- fill source that took intel offline on 2026-04-23 was /mnt/nvme-raid0/targets/aprender-ci/*: per-PR bind-mount target dirs from ci.yml's task-#134 isolation, holding 1.9T including a 359G orphan `debug/` dir from pre-isolation era. Disk-guard never touched them. Adds new BIND_MOUNT_ROOTS (default `/mnt/nvme-raid0/targets/aprender-ci`) and a prune_bind_mount_target_roots() helper: - Always prunes `debug/` subdir (orphan, no current workflow bind-mounts it). - Prunes PR# subdirs stale past a minute threshold (nightly: STALE_DAYS×24×60 min; pre-job: 60-min floor so fresh in-flight dirs survive full-disk recovery). - Preserves `main` (push-to-main CI reuses it). Space-separated BIND_MOUNT_ROOTS env var lets the same script cover sibling fleets (sovereign-ci-paiml-mcp-agent-toolkit etc.) without code changes. Deployed to intel 2026-04-23T12:58Z; nightly dry-run confirmed no unexpected prune candidates under the new path. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Root cause timeline
Fix
New helper `prune_bind_mount_target_roots` walks each root in `$BIND_MOUNT_ROOTS` (default `/mnt/nvme-raid0/targets/aprender-ci`):
Space-separated `BIND_MOUNT_ROOTS` env var lets sibling fleets (sovereign-ci-paiml-mcp-agent-toolkit etc.) extend coverage via config only.
Deployment
Deployed to intel 2026-04-23T12:58Z alongside the PR #1001 version update (intel was still on an older build). Nightly dry-run confirmed no unexpected candidates under the new path after manual cleanup.
```
$ sudo md5sum /usr/local/bin/runner-disk-guard.sh
921e055c55a2c8f1838aac6809d60840 /usr/local/bin/runner-disk-guard.sh
$ md5sum scripts/runner-infra/runner-disk-guard.sh
921e055c55a2c8f1838aac6809d60840 scripts/runner-infra/runner-disk-guard.sh
```
Test plan
🤖 Generated with Claude Code