feat(crux-b-02): GGUF→safetensors layout + metadata + PEFT classifier (3 of 4 FALSIFY at PARTIAL_ALGORITHM_LEVEL)#966
Open
feat(crux-b-02): GGUF→safetensors layout + metadata + PEFT classifier (3 of 4 FALSIFY at PARTIAL_ALGORITHM_LEVEL)#966
Conversation
2 tasks
noahgift
added a commit
that referenced
this pull request
Apr 21, 2026
…#967) test_mmap_writer_no_blocking panicked on PR #962 workspace-test (self-hosted runner) at 30.168ms — a 1% overshoot of the 30ms debug target. Root cause: shared-tenant wall-clock contention on the self-hosted runner, same genre as the earlier quantize perf flake chain (#955/#956/#957). Fix: raise the debug-build threshold to 100ms with a comment citing the observed flake. A catastrophic regression (>3× slower) still trips the assertion. Release target (500us) unchanged. Main CI andon rule (CLAUDE.md): no #[ignore] for flakes — fix the threshold so the signal survives tenant contention. Unblocks auto-merge queue: #962 #963 #964 #965 #966.
noahgift
added a commit
that referenced
this pull request
Apr 21, 2026
…LI + 20 e2e tests Wires the GGUF→safetensors classifier from PR #966 into a reachable `apr gguf-safetensors-lint --observation-file <path>` CLI and adds a 20-test e2e falsification harness, satisfying all four CRUX-SHIP-001 merge gates (g1 classifier green, g2 CLI reachable, g3 e2e runs, g4 contract discharged with cli_path + e2e_path + e2e_tests on every PARTIAL_ALGORITHM_LEVEL FALSIFY gate). Contract crux-B-02-v1.yaml bumped 1.1.0 → 1.2.0: - updated: "2026-04-21" - evidence blocks on FALSIFY-CRUX-B-02-{001,003,004} gain cli_path + e2e_path + e2e_tests - ship_discipline block added with 4 merge-gate receipts + partial_scope_rationale documenting FALSIFY-CRUX-B-02-002 stays NOT_DISCHARGED (BLOCKER-FIXTURE-ABSENT — no Q4_K_M dequant numerical harness yet) - pv validate PASS CLI command count 62 → 63 (apr-cli-commands-v1.yaml scope string + commands[] entry + tests/cli_commands.rs registered_commands + extended_commands.rs enum variant + dispatch_analysis.rs arm). Observation schema: { "layout": { "listing": [...] }, // FALSIFY-001 "metadata": { "kv": {...}, "expected_outcome": "ok|missing_key|wrong_type" }, // FALSIFY-003 "peft": { "tensor_names": [...], "target_modules": [...], "expected_outcome": "resolved|unresolved" } // FALSIFY-004 } Each top-level key is optional; observer-asserts-expected-outcome so classifier errors with a pre-declared expectation (e.g. missing_key) still exit 0, while classifier outcome disagreeing with expectation exits non-zero and stamps FALSIFY-CRUX-B-02-{001,003,004} on stderr. Cargo test -p apr-cli --test falsification_crux_b_02: 20 pass / 0 fail Cargo test -p apr-cli --test cli_commands: 6 pass / 0 fail Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
3303012 to
d177d9a
Compare
…n + PEFT target-module classifier (3 of 4 FALSIFY at PARTIAL_ALGORITHM_LEVEL)
Three pure classifiers in crates/apr-cli/src/commands/gguf_to_safetensors.rs:
1. hf_required_files() + missing_hf_files(listing) — canonical trio
{model.safetensors, config.json, tokenizer.json} that a converted
directory must contain. Without these filenames, no downstream
HF byte-level load can begin.
2. translate_gguf_metadata(gguf_kv) -> HfLlamaConfig — pure mapping
from GGUF llama.* keys onto the HF config.json fields. Missing
keys return Err(MissingKey) rather than silently defaulting,
which would let from_pretrained succeed with the wrong layer
count and produce garbage.
3. peft_target_modules_resolve(tensor_names, target_modules) —
substring-match check that every PEFT target (q_proj, v_proj, ...)
resolves to at least one tensor. Mirrors PEFT's own name-matching
rule. Unresolved targets are flagged before the attach call.
Contract crux-B-02-v1.yaml v1.0.0-draft -> v1.1.0 (draft -> partial):
- FALSIFY-001 (required files present): PARTIAL_ALGORITHM_LEVEL (5 tests)
- FALSIFY-002 (dequant numerical bound): NOT_DISCHARGED (needs Q4_K_M harness)
- FALSIFY-003 (transformers loads output): PARTIAL_ALGORITHM_LEVEL (8 tests)
- FALSIFY-004 (PEFT LoRA attaches): PARTIAL_ALGORITHM_LEVEL (5 tests)
18 / 18 tests pass. pv validate: 0 errors, 0 warnings.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…LI + 20 e2e tests Wires the GGUF→safetensors classifier from PR #966 into a reachable `apr gguf-safetensors-lint --observation-file <path>` CLI and adds a 20-test e2e falsification harness, satisfying all four CRUX-SHIP-001 merge gates (g1 classifier green, g2 CLI reachable, g3 e2e runs, g4 contract discharged with cli_path + e2e_path + e2e_tests on every PARTIAL_ALGORITHM_LEVEL FALSIFY gate). Contract crux-B-02-v1.yaml bumped 1.1.0 → 1.2.0: - updated: "2026-04-21" - evidence blocks on FALSIFY-CRUX-B-02-{001,003,004} gain cli_path + e2e_path + e2e_tests - ship_discipline block added with 4 merge-gate receipts + partial_scope_rationale documenting FALSIFY-CRUX-B-02-002 stays NOT_DISCHARGED (BLOCKER-FIXTURE-ABSENT — no Q4_K_M dequant numerical harness yet) - pv validate PASS CLI command count 62 → 63 (apr-cli-commands-v1.yaml scope string + commands[] entry + tests/cli_commands.rs registered_commands + extended_commands.rs enum variant + dispatch_analysis.rs arm). Observation schema: { "layout": { "listing": [...] }, // FALSIFY-001 "metadata": { "kv": {...}, "expected_outcome": "ok|missing_key|wrong_type" }, // FALSIFY-003 "peft": { "tensor_names": [...], "target_modules": [...], "expected_outcome": "resolved|unresolved" } // FALSIFY-004 } Each top-level key is optional; observer-asserts-expected-outcome so classifier errors with a pre-declared expectation (e.g. missing_key) still exit 0, while classifier outcome disagreeing with expectation exits non-zero and stamps FALSIFY-CRUX-B-02-{001,003,004} on stderr. Cargo test -p apr-cli --test falsification_crux_b_02: 20 pass / 0 fail Cargo test -p apr-cli --test cli_commands: 6 pass / 0 fail Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
d177d9a to
98ece26
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Discharges CRUX-B-02 (`apr convert --format safetensors` producing a HuggingFace-loadable directory) at PARTIAL_ALGORITHM_LEVEL for 3 of 4 FALSIFY gates.
Three pure classifiers in `crates/apr-cli/src/commands/gguf_to_safetensors.rs`:
Contract status
`contracts/crux-B-02-v1.yaml` v1.0.0-draft → v1.1.0, status `draft` → `partial`.
Full discharge blocks on Q4_K_M dequant vs reference f32 harness + uv-run transformers/peft end-to-end harnesses.
Test plan
🤖 Generated with Claude Code