feat(context): aggressive token-engine — adaptive intensity, redundancy cache, summarize fallback#10
Merged
claudioemmanuel merged 2 commits intodevelopfrom Apr 7, 2026
Merged
Conversation
…cy cache, summarize fallback PR1 of two. Adds the new context optimization layer (src/context/) that sits between filter::compress and wrap.rs, attacking three sources of token waste that previously went unaddressed: - Adaptive intensity: bash header now reports the active level (Lite/Full/Ultra) and per-handler limits scale automatically as cumulative session usage approaches the compact_threshold budget. Floors enforced so we never reduce to zero. - Cross-call redundancy cache: the same compressed output within the last 8 calls is collapsed to a single reference line. Length-equality guarded against FNV-1a collisions; tiny outputs (<5 lines) skipped. - Summarize fallback: raw outputs over 500 lines (configurable) are replaced with a dense ≤40-line summary (top errors, top files, test summary, last 20 lines verbatim) instead of running through the per-handler truncation pipeline. - SessionContext persists at sessions/context.json next to current.json with bounded ring buffers (32 calls, 256 files, 128 errors, 64 git refs). Hand-rolled flat-array JSON via the existing json_util — no serde, zero new dependencies. - Cross-call hint: cat/head/tail/less/more/bat of a file already in context emits "# squeez hint: <path> already in context (Read tool, call #N)" without blocking execution. All new behavior is opt-out via four new config keys (adaptive_intensity, context_cache_enabled, redundancy_cache_enabled, summarize_threshold_lines). Existing handlers and strategies are untouched; the wrap.rs diff is ~30 lines. Tests: 167 passing (was 132). New integration tests cover intensity boundaries, cache round-trip, redundancy hit/miss, summarize fallback. Inline unit tests in each context module add ~25 more cases. Benches: bench/run.sh stays at 12/12; new bench/run_context.sh exercises the wrap.rs pre/post pass end-to-end (3/3 passing). CI workflow runs both. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
User requested maximum aggression by default. The Lite/Full tiers remain in the enum (forward-compat) but derive() now returns Ultra unconditionally when adaptive_intensity is enabled. To opt out of scaling entirely, set adaptive_intensity = false (falls back to Lite). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
8 tasks
6 tasks
claudioemmanuel
added a commit
that referenced
this pull request
Apr 7, 2026
feat(context): aggressive token-engine — adaptive intensity, redundancy cache, summarize fallback
This was referenced Apr 7, 2026
claudioemmanuel
added a commit
that referenced
this pull request
Apr 7, 2026
feat(context): aggressive token-engine — adaptive intensity, redundancy cache, summarize fallback
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PR1 of two implementing the squeez token-optimizer roadmap. Adds the context optimization layer (
src/context/) — a minimally-invasive engine that sits betweenfilter::compressandwrap.rsand attacks three sources of token waste squeez previously ignored.Lite/Full/Ultra) and per-handler limits (max_lines, dedup_min, git_diff_max, docker_logs_max, find_max, summarize_threshold) automatically scale as cumulative session usage approaches thecompact_thresholdbudget. Floors enforced so we never reduce to zero.[squeez: identical to <hash> at bash#<n> — re-run with --no-squeez]. Length-equality guarded against FNV-1a collisions; tiny outputs (<5 lines) skipped.SessionContextpersists atsessions/context.jsonnext tocurrent.jsonwith bounded ring buffers (32 calls, 256 files, 128 errors, 64 git refs). Hand-rolled flat-array JSON — no serde, zero new dependencies (still justlibcon Unix).cat/head/tail/less/more/batof a file already in context emits a one-line hint without blocking execution.All new behavior is opt-out via four new config keys (
adaptive_intensity,context_cache_enabled,redundancy_cache_enabled,summarize_threshold_lines). Existing handlers and strategies are untouched; thewrap.rsdiff is ~30 lines.Changes
src/context/module:intensity.rs,cache.rs,redundancy.rs,summarize.rs,hash.rs(FNV-1a 64),mod.rssrc/commands/wrap.rs: pre/post-pass insertion +[adaptive: <Level>]header tagsrc/config.rs: 4 new fields with defaults + INI parser armssrc/json_util.rs:extract_u64_array,u64_array,usize_arrayhelpersbench/fixtures/:summarize_huge.txt,intensity_budget80.txt,context_crosscall_{1,2,3}.txtbench/run_context.sh: end-to-end wrap-mode bench (3 scenarios)bench/run.sh: skipscontext_crosscall_*(handled by run_context.sh).github/workflows/ci.yml: runsbench/run_context.shafterbench/run.shREADME.md: documents the four new config keys + intensity modelTest plan
cargo test— 167 passing (was 132)cargo build --release— cleanbench/run.sh— 12/12 fixtures pass (incl. newsummarize_huge100% reduction,intensity_budget8099% reduction)bench/run_context.sh— 3/3 scenarios pass:summarize_hugetriggers summary header, output ≤60 linesintensity_budget80with seededcurrent.jsonshows[adaptive: Ultra]context_crosscall_{2,3}emit redundancy reference lines after_1Risks & mitigations
RECENT_WINDOW=8; opt-out viaredundancy_cache_enabled=false[adaptive: <Level>]libc(Unix-only) in Cargo.tomlPR2 (memory compressor + caveman persona +
squeez update+track-resulthook) will follow on top of this branch.🤖 Generated with Claude Code