Conversation
Introduce explicit Role classification on StorageConfig so the rest of the codebase can stop inferring disk type from RO/Serial/filename. Adds MountPoint, FSType, DirectIO fields used by the upcoming user data-disk feature, and a runtime validator that enforces Role-RO consistency, data-disk fstype/mount/serial rules, and DirectIO scope. VMConfig grows a transient DataDisks slice that the CLI populates from --data-disk and Create consumes when calling PrepareDataDisks. This is the type-layer scaffold; producers and classifiers switch in follow-up commits.
Double-write phase: every site that constructs a StorageConfig now sets
Role alongside the existing RO/Serial fields. Classifiers still consume
RO/Serial; the next commit flips them to Role atomically.
Producers updated:
- images/oci, images/cloudimg: base layers tagged Layer
- hypervisor/utils.PrepareOCICOW: COW tagged COW
- cloudhypervisor.prepareCloudimg: overlay COW, cidata Cidata
- cloudhypervisor.ensureCloneCidata: appended cidata Cidata
- cloudhypervisor.rebuildStorageConfigs (clone reconstruction): Role
inferred from chDisk via roleFromCHDisk
- firecracker.rebuildCloneStorage: layers Layer, COW
The CH/FC clone-time reconstructions need Role here (not in a later
commit) because once classifiers switch to Role, configs without Role
would silently misbehave on the clone path.
All disk classification now reads StorageConfig.Role instead of inferring
from RO, Serial==CowSerial, or basename matching. With the producer
double-write in place, this is a complete swap.
- args.activeDisks: skips Role==Cidata; isCidataDisk deleted
- helper.ReverseLayerSerials: filters Role==Layer
- clone.updateCOWPath: matches Role==COW; the directBoot/non-directBoot
branch collapses since Role is unambiguous
- clone.updateCloneCidataPath, ensureCloneCidata: Role==Cidata
- clone tests retargeted to Role
- hypervisor.ExtractBlobIDs / VerifyBaseFiles: Role==Layer
- firecracker.buildCmdline: nLayers via Role==Layer
- firecracker.withSourceCOWLocked: Role==COW
- firecracker.rebuildCloneStorage iterates by Role
- firecracker.saveSnapshotMeta value-copies the StorageConfig so Role
survives serialization (avoids losing newly added fields again)
- cmd/vm/run.printPostCloneHints: cloudimg detection via ImageType
- cmd/vm/debug.printFCDebug: nLayers via Role
User-facing flag --data-disk lands. Each value is parsed as
size=20G[,name=...][,fstype=ext4|none][,mount=/mnt/x][,directio=on|off|auto];
omitted name is filled with dataN, omitted mount with /mnt/<name>, omitted
fstype with ext4. fstype=none requires empty mount. Names cap at 20 chars
to match Linux's virtio by-id truncation, and the cocoon- prefix is
reserved.
hypervisor.PrepareDataDisks creates each <runDir>/data-<name>.raw via
OpenFile+Truncate (PrepareOCICOW pattern; os.Truncate would not create),
mkfs.ext4 reuses InitCOWFilesystem. fstype=none leaves the file unformatted.
Both backends append data disks after their writable rootfs disk:
- CH OCI: layers... + cow + data...
- CH cloudimg: overlay + data... + cidata
(cidata stays last so activeDisks() filtering after
first boot doesn't shift devPath of data disks)
- FC OCI: layers... + cow + data...
Snapshot/clone/restore still see only the new disks as raw files; the
follow-up commits make them roundtrip correctly.
CH writes a cocoon.json sidecar inside every snapshot tar that mirrors config.json's disk shape — same order, same length — but carries the cocoon-only fields (Role, MountPoint, FSType, DirectIO) that CH's schema cannot. Sidecar is built by parsing the snapshot's just-written config.json, then looking each disk path up in rec.StorageConfigs. This deliberately avoids activeDisks(rec): a cloudimg VM snapshotted between FirstBooted=true and the next stop still has cidata attached in CH's view, so config.json shows it; activeDisks(rec) would not, breaking the sidecar/config.json length invariant patchCHConfig depends on. Clone reads the sidecar, validates it through ValidateStorageConfigs (may come from imported snapshots, untrusted), then proceeds with the StorageConfigs already carrying full Role/MountPoint/FSType. The old rebuildStorageConfigs helper and roleFromCHDisk Role-from-chDisk inference are removed; the sidecar makes them unnecessary. The TestRebuildStorageConfigs fixture is dropped along with them. direct.cloneSnapshotFiles needs no change: cocoon.json falls into the Meta classifier and is plain-copied into the clone's runDir.
Snapshot now reflinks each Role==Data disk into the staging dir during the pause window (same window as COW) so the tar carries them and streamed/direct clone places them at the new runDir. CH and FC follow identical patterns; data disks are not subject to FirstBooted filtering, so they're enumerated directly off rec.StorageConfigs rather than via activeDisks(). cleanSnapshotFiles matchers in both backends' direct.go now include data-*.raw so a restore (which goes kill-VM → clean → reflink) cannot leave stale data disks from a previous incarnation alongside the freshly extracted ones. CH also cleans the cocoon.json sidecar; FC already had its sidecar in the matcher under a different name.
withSourceCOWLocked only held the COW lock; with data disks added, a concurrent snapshot reflinking those same files races against a clone that renames+symlinks them via createDriveRedirects. The new helper collects every Role==COW || Role==Data path, sorts by string, and locks them sequentially via withCOWPathLocked. The dictionary order guarantees concurrent callers acquire locks in the same sequence, so deadlock is impossible. recoverStaleBackup runs once per path after its lock is held, matching the previous semantics for the COW. clone.go's call site switches to the new helper; snapshot.go replaces its bare withCOWPathLocked(cowPath, ...) the same way. The old withSourceCOWLocked is removed.
Add hypervisor.ValidateSnapshotIntegrity (sidecar runs through
ValidateStorageConfigs and every Role==Data disk has its data-<serial>.raw
present) and hypervisor.ValidateRoleSequence (snapshot is a strict prefix
of rec; only allowed extension is trailing cidata). Each backend wraps
these into a preflight that runs BEFORE killing the live VM, so a
malformed/imported snapshot can't cost an outage.
Clone-side fixes:
- CH restorePatchStorageConfigs filters Role==Cidata when
ensureCloneCidata appended one (replaces tail-slice that broke when
data disks shifted cidata away from the tail in the unfiltered list)
- CH buildStateReplacements iterates min(len(chCfg), len(storageConfigs))
so the +1 cidata append no longer wipes path replacements
- CH updateDataDiskPaths rewrites Role==Data Path to the clone's runDir
using Serial as the basename source
- CH ensureCloneCidata calls types.ValidateStorageConfigs after the
cidata append, since sidecar-time validation can't see the appendee
- FC rebuildCloneStorage rebuilds in original sidecar order, rewriting
COW + Data paths to the clone runDir, keeping Layer paths verbatim,
and refusing the impossible Role==Cidata case explicitly
Restore-side fixes:
- CH/FC streaming Restore preflight at staging before kill
- CH/FC DirectRestore preflight at srcDir before kill (required
splitting prepareRestore into resolve + kill so preflight sits between)
- CH restoreAfterExtract uses len(sidecar) for the prefix slice instead
of len(snapshotCfg.Disks); equivalent today but explicit about which
file is the source of truth
cloud-init mounts: lands in user-data so cloudimg+CH guests auto-mount their data disks. Each spec is a YAML-quoted 6-tuple [device, mount, fstype, options, freq, passno]; quoting goes through the same yamlQuote function as the password field, so an imported sidecar can't break out of the YAML scalar. Default options is "defaults,nofail" so a missing or corrupt disk doesn't keep the guest from booting. CH generateCidata now accepts the storageConfigs slice so it can build the mount list itself: Role==Data with non-empty MountPoint and a known FSType (none → guest is responsible for mkfs+mount, skip) become mounts: rows. Device path uses /dev/disk/by-id/virtio-<serial>, which CH exposes via chDisk.Serial. FC has no serial field on fcDrive so this path doesn't apply there — Firecracker data disks remain /dev/vdX-only. DirectIO becomes per-disk on CH: storageConfigToDisk and patchDisks both prefer sc.DirectIO when non-nil, else fall back to !sc.RO && !noDirectIO. A workload with mixed cache requirements (e.g. database + build cache on one VM) can now opt each disk in or out independently. FC has no equivalent knob, so configureVM emits a warn the first time DirectIO is seen on a data disk and proceeds with IoEngine=Async.
ValidateStorageConfigs now runs at FinalizeCreate (catches buggy producers before persisting), Start (catches DB corruption), and Snapshot (catches drift before serialization). Clone and Restore already validate via the snapshot preflight, so the matrix is now complete. cmd/vm/debug also warns when --data-disk is set: debug only prints the launcher command, it never runs PrepareDataDisks, so silently dropping the flag would mislead users into thinking the printed command would materialize data disks.
- types/storage_test: ValidDataDiskName regex/length/reserved-prefix cases; ValidateStorageConfigs invariants per Role (RO mismatch, bad fstype, mount-with-fstype-none, non-absolute mount, DirectIO on non-data, nil entry, missing role). - cmd/core/helpers_test: parseDataDiskSpec covers minimal/maximal flags, directio on/off/auto tristate, MountPointSet distinguishing mount=<empty> from omitted, plus the rejection cases (size below 16MiB, reserved prefix, bad name, xfs, unknown key). normalizeDataDiskSpecs covers auto-name skip-already-used, default fstype/mount fill-in, fstype=none rejecting mount, explicit empty mount preservation, and duplicate-name rejection. - metadata/metadata_test: mounts: section appears with Mounts non-empty, is omitted otherwise, and yamlQuote escapes single quotes (a hostile imported sidecar can't break out of the YAML scalar). - clone_test: BuildStateReplacements no longer wipes the prefix when ensureCloneCidata leaves storageConfigs one longer than chCfg.Disks. RestorePatchStorageConfigs drops only the appended cidata, preserves trailing data disks, and is a no-op for direct-boot or snapshot-had-cidata cases.
- types: export FSTypeExt4 / FSTypeNone constants so cmd/core, hypervisor, and cloudhypervisor stop repeating the literals (goconst) - hypervisor.utils: fold MinDataDiskSize into the existing const block per code-style "one block per file" - shadow: rename inner err to vErr at every site where ValidateStorageConfigs shadowed an outer err (govet shadow) - cloudhypervisor.restore: drop the unused prepareRestore helper now that DirectRestore inlines resolve+preflight+kill (unused) - gofumpt: trim a stray blank line in firecracker/clone.go make lint clean on linux + darwin; tests pass.
/simplify pass: - DataDiskBaseName centralizes the data-<serial>.raw filename, used by PrepareDataDisks, ValidateSnapshotIntegrity, BaseConfig.DataDiskPath, CH updateDataDiskPaths, and FC rebuildCloneStorage instead of each reconstructing the same string - IsDataDiskFile replaces the inline strings.HasPrefix+HasSuffix in CH and FC cleanSnapshotFiles matchers - ReflinkDataDisks consolidates the identical Role==Data reflink loop that lived in both CH snapshot.go and FC snapshot.go (8 lines × 2) - PrepareOCICOW now calls createSparseFile, dropping the duplicate OpenFile+Truncate+Close ladder
Three findings from review, all confirmed real:
P1: CH/FC preflight let snapshots through even when CH state.json,
memory-range-*, FC vmstate, mem, COW or cidata files were missing.
killForRestore would already have run by the time vm.restore /
snapshot-load actually noticed, costing an outage on a malformed tar.
- hypervisor.ValidateSnapshotIntegrity now stats every snapshot-resident
file in the sidecar (Role==COW, Cidata, Data) by basename. Layers are
shared blobs, not in srcDir, so they're skipped.
- cloudhypervisor.validateSnapshotIntegrity adds state.json + at least
one memory-range-* check on top of the common preflight.
- firecracker.preflightRestore adds vmstate + mem checks on top of the
common preflight (cow.raw is covered by the sidecar walk now).
P1: FC restore / DirectRestore only locked the COW path. With data disks,
a previous clone that crashed mid-redirect could leave a stale
data-<x>.raw.cocoon-clone-backup. Restore happily wrote a new
data-<x>.raw next to it, then the next clone or snapshot acquired that
disk's lock, ran recoverStaleBackup, and renamed the backup back over
the freshly restored data — silently corrupting it.
- Both restore paths switch to withSourceWritableDisksLocked(rec.
StorageConfigs, ...), which already runs recoverStaleBackup on every
writable disk (COW + Data) in dictionary order. Stale backups get
healed before the restore touches the disks, and held locks block
concurrent clone/snapshot from racing the swap.
writeSnapshotMeta builds the sidecar by walking chCfg.Disks in order, so sidecar[i] and chCfg.Disks[i] must agree on Path and ReadOnly. We already validated count, structural fields, file presence, and role sequence, but left the per-index identity unchecked. A drifted sidecar (tampered, partial overwrite, snapshot built by an older cocoon version) would have slipped through preflight; clone's patchCHConfig would then have written each sidecar entry's path into the wrong disk slot, leaving CH device state pointing at the wrong files. Per-index Path + ReadOnly comparison closes the gap; cost is one short loop. Role is not on chCfg, so it stays out of this check.
Senior /code review pass found the preflight helpers (ValidateSnapshotIntegrity, ValidateRoleSequence, DataDiskBaseName, IsDataDiskFile) had no direct unit coverage despite being security- critical: they're what gate restore against tampered or imported sidecars. They're pure functions, so coverage is cheap. Tests cover: file presence per Role, layer entries skipped (shared blob not in srcDir), structural rejection delegated to ValidateStorageConfigs, prefix-only role sequence with allowed trailing cidata, sidecar-longer- than-rec rejection, and the data disk filename helpers.
Senior /code review (full repo walk, 178 files) flagged three places where this branch had introduced or aggravated the "exported above unexported" rule from SKILL.md: - cloudhypervisor/helper.go: ReverseLayerSerials sat between two private helpers (hasMemoryRangeFile and vmAPI). Lifted it to the top of the helper section, right after the var block. - hypervisor/utils.go: createSparseFile and snapshotResidentBasename are private callees of PrepareDataDisks / PrepareOCICOW / ValidateSnapshot Integrity. Moved both to the bottom of the file so the exported API stays contiguous. - cmd/core/helpers.go: parseDataDiskFlags + parseDataDiskSpec + normalizeDataDiskSpecs sat between two exported VMConfig flag builders. Moved to the bottom of the private-helper region next to mergeResourceFlags. The two remaining violations the audit found (hypervisor/backend.go's SocketPath and firecracker/create.go's EnsureVmlinux) predate this branch and are out of scope.
SKILL.md says "One single const block per file, placed at the top (after imports)" — strictly read, that means before any type declarations. The codebase had eight files with type-before-const, the typed-constants pattern using forward references on the type name now that const moves up. Fixed across the whole repo, not just this branch: - types/storage.go, types/vm.go - hypervisor/utils.go, hypervisor/cloudhypervisor/helper.go - cmd/images/handler.go - progress/oci/oci.go, progress/cloudimg/cloudimg.go - config/config.go The two earlier-flagged baseline function-ordering complaints (hypervisor/backend.go SocketPath, firecracker/create.go EnsureVmlinux) were false positives from my first-pass scanner: it conflated methods with standalone functions. SKILL.md applies "exported above unexported" within each category, and both files transition cleanly from method-section (ending with private method) to standalone-section (starting with public function). No fix needed there; my v2 script confirms zero violations after the const moves.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.