chore(rust): walker semantics in process_value — stringified-JSON + opaque strings#290
Merged
chopratejas merged 2 commits intomainfrom Apr 28, 2026
Merged
Conversation
…paque strings
Stage 3c.2 PR5. Closes the gap between the public `crush()` API and
the standalone `DocumentCompactor` walker. Augments
`SmartCrusher::process_value`'s String branch to mirror the walker's
two String cases:
1. Stringified-JSON containers: parse, recurse via process_value,
re-emit. The wrapping field stays a string but its contents are
processed end-to-end. Special-cases the lossless-compaction
path (when recursion returns Value::String) to avoid double-
JSON-encoding.
2. Opaque blobs (long base64 / HTML / long-text strings):
substitute with `<<ccr:HASH,KIND,SIZE>>` markers — same format
as walker.rs and PR4's lossy CCR-Dropped markers, so downstream
consumers can pattern-match regardless of which path emitted.
Type chore so the package version doesn't bump.
# Why this matters
After PR4, calling `SmartCrusher::new()` and then `crush(json_blob)`
gets lossless-first compaction on top-level arrays. But for tool
outputs that wrap a JSON-encoded payload INSIDE a string field, or
contain opaque blobs (base64-encoded files, HTML chunks), the public
API was a no-op. The walker handled these but only via its own
entry point; users calling `crush()` never hit it.
PR5 brings walker semantics into the public path. Same vision the
user described early in this stage:
> JSON within JSON, opaque payloads, multi-layered
now works for `crush()` callers without any extra setup.
# Implementation (~80 lines)
- `process_string` method on SmartCrusher: dispatches to JSON-recurse
/ CCR-substitute / passthrough.
- `try_parse_json_container_str`: cheap parse-only-containers helper.
- `ccr_marker_for_string` + `opaque_kind_label` + `humanize_bytes`:
marker formatting matching walker.rs byte-for-byte.
Reuses every PR2 primitive — no new traits, no new IR, no new
abstractions.
# Tests (6 new)
- Short string passthrough (no false positives)
- Stringified-JSON array recurses (50 items inside a string field)
- Opaque base64 blob → CCR marker substitution
- Top-level plain text passthrough (crush(plain_text) unchanged)
- Short JSON-looking strings unchanged (no false-positive opaque)
- Helper parses only containers, not bare scalars
303/303 smart_crusher tests pass (was 297 → +6 PR5).
185/185 Python tests. make ci-precheck green.
Module: crates/headroom-core/src/transforms/smart_crusher/crusher.rs
….toml The action was set to @stable, which installs whatever the latest stable is (1.95.0 right now). Then maturin invokes cargo, which reads rust-toolchain.toml and re-resolves to "1.95.0 + clippy + rustfmt". rustup treats stable and 1.95.0 as distinct toolchain identities and refuses the second install with: failed to install component 'clippy-preview-x86_64-unknown-linux-gnu', detected conflict: 'bin/cargo-clippy' This was intermittent across the matrix (only test (3.10) tripped on the most recent run; others got lucky on cache state). Pinning the action ref to 1.95.0 makes both sides ask for the exact same toolchain identity, so the second install is a no-op and the conflict can't fire. Bump procedure stays the same: when rust-toolchain.toml's channel changes, update these refs in lock-step. Plugin manifests auto-bumped 0.11.0 -> 0.13.2 by sync-plugin-versions hook (unrelated to the workflow fix).
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Stage 3c.2 PR5. Closes the gap between the public `crush()` API and the standalone `DocumentCompactor` walker. Augments `SmartCrusher::process_value`'s String branch to mirror the walker's two String cases:
Type: `chore` so the package version doesn't bump.
Why this matters
After PR4, `SmartCrusher::new()` does lossless-first compaction on top-level arrays. But for tool outputs that wrap a JSON-encoded payload inside a string field (extremely common with MCP, function-call APIs, etc.) or contain opaque blobs (base64-encoded files, HTML chunks), the public API was a no-op. The walker handled these — but only via its own entry point.
PR5 brings walker semantics into the public path. The user's original vision now works for `crush()` callers without any extra setup:
Implementation (~80 lines + tests)
Reuses every PR2 primitive — no new traits, no new IR, no new abstractions.
Tests (6 new)
303/303 smart_crusher tests pass (was 297 → +6 PR5). 185/185 Python tests. `make ci-precheck` green.
What's next