Skip to content

fix(snapshot): fall back to non-sparse tar when segment map exceeds PAX cap#23

Merged
CMGS merged 1 commit intomasterfrom
fix/sparse-tar-pax-overflow
May 5, 2026
Merged

fix(snapshot): fall back to non-sparse tar when segment map exceeds PAX cap#23
CMGS merged 1 commit intomasterfrom
fix/sparse-tar-pax-overflow

Conversation

@tonicmuroq
Copy link
Copy Markdown
Contributor

@tonicmuroq tonicmuroq commented May 5, 2026

Fixes #24.

Summary

cocoon snapshot export fails with archive/tar: header field too long when memory-ranges is highly fragmented. The cause is Go's archive/tar 1MB cap (maxSpecialFileSize = 1<<20, see archive/tar/format.go) on the encoded PAX block — our sparse encoding packs the whole segment list into a single COCOON.sparse.map PAX record, and a sufficiently fragmented file produces a JSON that overflows it.

This fix detects the overflow up front (mapJSON > ~800KB, well under the 1MB cap) and falls back to writing the file as a regular non-sparse tar entry. Memory-ranges can be GB-scale, so we lose the sparse-export size advantage on the affected file, but a larger successful push beats an indefinite hung loop. Reader path is unchanged — the fallback emits a standard tar entry that existing extract logic handles via the non-sparse path.

Repro

Run a cocoon Windows VM with simular-pro-agent-runtime 1.8.0 (an Electron app with a live Firebase WebSocket). The agent's V8 heap + IPC buffers + WebSocket pools fragment the guest memory enough that memory-ranges yields tens of thousands of sparse segments; the JSON exceeds 1MB.

$ sudo cocoon snapshot save --name probe-realjwt <vmid>
INF snapshotting VM <vmid> ...
INF saving snapshot data ...
INF snapshot saved: <id>
$ sudo cocoon snapshot export probe-realjwt -o /tmp/probe.tar
INF exporting to /tmp/probe.tar ...
Error: write archive: write header memory-ranges: archive/tar: header field too long

vk-cocoon then loops the push indefinitely (only the metadata blobs reach epoch; the memory layer is never PUT), so vm-service hibernate never observes phase=Suspended and the API caller times out. With this fix, export emits a non-sparse memory-ranges entry and the push completes normally.

Empirical Go-tar limit measured before/after this change:

segments mapJSON size result
30,000 ~736 KB OK (sparse path)
50,000 ~1.2 MB header field too long (without fix) → fallback (with fix)

Test plan

  • TestTarFileMaybeSparse_FallsBackOnLargeMap — verifies the fallback triggers when the map exceeds the cap and no PAX sparse records are written.
  • TestTarFileMaybeSparse_FallbackRoundTrip — extract of a fallback-encoded file matches the original byte-for-byte (including the holes that get serialised as zeros).
  • TestTarFileMaybeSparse_PreservesSparsePathForSmallMaps — sanity that ordinary fragmentation still uses the sparse encoding.
  • Full utils package suite passes on linux/amd64 (verified by running the cross-compiled test binary on a cocoon node).

🤖 Generated with Claude Code

…AX cap

Go's archive/tar caps the encoded PAX block at maxSpecialFileSize =
1<<20 (see archive/tar/format.go). Our sparse encoding stuffs the
entire sparse-segment list into a single COCOON.sparse.map PAX record,
which is fine for typical disk images but trips the cap when the
underlying file is highly fragmented.

Reproduced 2026-05-05 against a live cocoon Windows VM running the
simular-pro-agent-runtime 1.8.0 build (Electron app, real Firebase
WebSocket signed-in): the agent's V8 heap + IPC + WebSocket pools
fragment the guest memory enough that memory-ranges yields tens of
thousands of sparse segments; the resulting JSON exceeds 1MB and
`cocoon snapshot export` fails with:

    Error: write archive: write header memory-ranges:
        archive/tar: header field too long

vk-cocoon then loops indefinitely retrying the push (see
project_cocoon_sparse_pax_overflow memory note in vm-service tree),
silently breaking hibernate.

The fix detects the overflow up front (mapJSON > ~800KB, well below
the 1MB cap) and falls back to writing the file as a regular,
non-sparse tar entry. This loses the sparse-export size advantage on
the affected file (memory-ranges can be GB-scale), but those files are
not the dominant size driver in a snapshot — and a successful larger
push beats a hung loop. Empirical Go-tar limit measured: 30k segments
(~736KB JSON) succeed, 50k (~1.2MB) fail.

Reader path is unchanged. The fallback emits a standard tar entry that
existing extract logic already handles via the non-sparse code path.

Tests:
  - TestTarFileMaybeSparse_FallsBackOnLargeMap — verifies the fallback
    triggers when the map exceeds the cap and that no PAX sparse
    records are written.
  - TestTarFileMaybeSparse_FallbackRoundTrip — extract of a
    fallback-encoded file matches the original byte-for-byte.
  - TestTarFileMaybeSparse_PreservesSparsePathForSmallMaps — sanity
    that small fragmentation still uses the sparse encoding.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

snapshot export fails with "header field too long" on highly fragmented memory-ranges

2 participants