fix(snapshot): fall back to non-sparse tar when segment map exceeds PAX cap#23
Merged
fix(snapshot): fall back to non-sparse tar when segment map exceeds PAX cap#23
Conversation
…AX cap
Go's archive/tar caps the encoded PAX block at maxSpecialFileSize =
1<<20 (see archive/tar/format.go). Our sparse encoding stuffs the
entire sparse-segment list into a single COCOON.sparse.map PAX record,
which is fine for typical disk images but trips the cap when the
underlying file is highly fragmented.
Reproduced 2026-05-05 against a live cocoon Windows VM running the
simular-pro-agent-runtime 1.8.0 build (Electron app, real Firebase
WebSocket signed-in): the agent's V8 heap + IPC + WebSocket pools
fragment the guest memory enough that memory-ranges yields tens of
thousands of sparse segments; the resulting JSON exceeds 1MB and
`cocoon snapshot export` fails with:
Error: write archive: write header memory-ranges:
archive/tar: header field too long
vk-cocoon then loops indefinitely retrying the push (see
project_cocoon_sparse_pax_overflow memory note in vm-service tree),
silently breaking hibernate.
The fix detects the overflow up front (mapJSON > ~800KB, well below
the 1MB cap) and falls back to writing the file as a regular,
non-sparse tar entry. This loses the sparse-export size advantage on
the affected file (memory-ranges can be GB-scale), but those files are
not the dominant size driver in a snapshot — and a successful larger
push beats a hung loop. Empirical Go-tar limit measured: 30k segments
(~736KB JSON) succeed, 50k (~1.2MB) fail.
Reader path is unchanged. The fallback emits a standard tar entry that
existing extract logic already handles via the non-sparse code path.
Tests:
- TestTarFileMaybeSparse_FallsBackOnLargeMap — verifies the fallback
triggers when the map exceeds the cap and that no PAX sparse
records are written.
- TestTarFileMaybeSparse_FallbackRoundTrip — extract of a
fallback-encoded file matches the original byte-for-byte.
- TestTarFileMaybeSparse_PreservesSparsePathForSmallMaps — sanity
that small fragmentation still uses the sparse encoding.
5cdec4b to
c814b15
Compare
This was referenced May 6, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #24.
Summary
cocoon snapshot exportfails witharchive/tar: header field too longwhenmemory-rangesis highly fragmented. The cause is Go'sarchive/tar1MB cap (maxSpecialFileSize = 1<<20, seearchive/tar/format.go) on the encoded PAX block — our sparse encoding packs the whole segment list into a singleCOCOON.sparse.mapPAX record, and a sufficiently fragmented file produces a JSON that overflows it.This fix detects the overflow up front (mapJSON > ~800KB, well under the 1MB cap) and falls back to writing the file as a regular non-sparse tar entry. Memory-ranges can be GB-scale, so we lose the sparse-export size advantage on the affected file, but a larger successful push beats an indefinite hung loop. Reader path is unchanged — the fallback emits a standard tar entry that existing extract logic handles via the non-sparse path.
Repro
Run a cocoon Windows VM with simular-pro-agent-runtime 1.8.0 (an Electron app with a live Firebase WebSocket). The agent's V8 heap + IPC buffers + WebSocket pools fragment the guest memory enough that memory-ranges yields tens of thousands of sparse segments; the JSON exceeds 1MB.
vk-cocoon then loops the push indefinitely (only the metadata blobs reach epoch; the memory layer is never PUT), so vm-service hibernate never observes phase=Suspended and the API caller times out. With this fix, export emits a non-sparse memory-ranges entry and the push completes normally.
Empirical Go-tar limit measured before/after this change:
header field too long(without fix) → fallback (with fix)Test plan
TestTarFileMaybeSparse_FallsBackOnLargeMap— verifies the fallback triggers when the map exceeds the cap and no PAX sparse records are written.TestTarFileMaybeSparse_FallbackRoundTrip— extract of a fallback-encoded file matches the original byte-for-byte (including the holes that get serialised as zeros).TestTarFileMaybeSparse_PreservesSparsePathForSmallMaps— sanity that ordinary fragmentation still uses the sparse encoding.utilspackage suite passes on linux/amd64 (verified by running the cross-compiled test binary on a cocoon node).🤖 Generated with Claude Code