Skip to content

Promote develop to main — PR1+PR2 token engine#13

Closed
claudioemmanuel wants to merge 28 commits intomainfrom
develop
Closed

Promote develop to main — PR1+PR2 token engine#13
claudioemmanuel wants to merge 28 commits intomainfrom
develop

Conversation

@claudioemmanuel
Copy link
Copy Markdown
Owner

Summary

Lands the full squeez token-optimizer roadmap (PR1 + PR2) on main:

  • Context engine — adaptive Ultra intensity (×0.3 all limits), cross-call redundancy cache (8-call window, FNV-1a + length guard), summarize fallback (>500 lines → ≤40-line dense report)
  • squeez compress-md — pure-Rust markdown compressor; Ultra mode with abbreviations; damage heuristic; auto-runs on every session start
  • Caveman Ultra persona — injected into session banner and copilot-instructions.md by default
  • squeez update — self-updater via curl + SHA-256 verification; atomic install
  • track-result hook — PostToolUse Read/Grep/Glob results feed SessionContext
  • Branch protection, auto-delete, PR governance workflows added
  • Co-Authored-By: Claude trailers removed from all commit messages

Benchmarks (macOS Apple Silicon)

Fixture Reduction Latency
5K-line log (summarize) -100% 12ms
ps aux -95% 6ms
cargo build (Ultra) -99% 4ms
git log 200 commits -70% 3ms
docker logs -73% 4ms

14/14 fixtures pass. All latency ≤15ms.

Test plan

  • cargo test — 18 integration tests pass (+ comprehensive unit suite)
  • bash bench/run.sh — 14/14 fixtures
  • bash bench/run_context.sh — 3/3 context scenarios

Closes #10 Closes #12

jhonatanjunio and others added 28 commits April 6, 2026 11:00
Merge README updates into develop
Merge promote workflow fix into develop
feat: add Windows (Git Bash) support
…cy cache, summarize fallback

PR1 of two. Adds the new context optimization layer (src/context/) that
sits between filter::compress and wrap.rs, attacking three sources of
token waste that previously went unaddressed:

- Adaptive intensity: bash header now reports the active level
  (Lite/Full/Ultra) and per-handler limits scale automatically as
  cumulative session usage approaches the compact_threshold budget.
  Floors enforced so we never reduce to zero.

- Cross-call redundancy cache: the same compressed output within the
  last 8 calls is collapsed to a single reference line. Length-equality
  guarded against FNV-1a collisions; tiny outputs (<5 lines) skipped.

- Summarize fallback: raw outputs over 500 lines (configurable) are
  replaced with a dense ≤40-line summary (top errors, top files, test
  summary, last 20 lines verbatim) instead of running through the
  per-handler truncation pipeline.

- SessionContext persists at sessions/context.json next to current.json
  with bounded ring buffers (32 calls, 256 files, 128 errors, 64 git
  refs). Hand-rolled flat-array JSON via the existing json_util — no
  serde, zero new dependencies.

- Cross-call hint: cat/head/tail/less/more/bat of a file already in
  context emits "# squeez hint: <path> already in context (Read tool,
  call #N)" without blocking execution.

All new behavior is opt-out via four new config keys
(adaptive_intensity, context_cache_enabled, redundancy_cache_enabled,
summarize_threshold_lines). Existing handlers and strategies are
untouched; the wrap.rs diff is ~30 lines.

Tests: 167 passing (was 132). New integration tests cover intensity
boundaries, cache round-trip, redundancy hit/miss, summarize fallback.
Inline unit tests in each context module add ~25 more cases.

Benches: bench/run.sh stays at 12/12; new bench/run_context.sh
exercises the wrap.rs pre/post pass end-to-end (3/3 passing).
CI workflow runs both.
User requested maximum aggression by default. The Lite/Full tiers
remain in the enum (forward-compat) but derive() now returns Ultra
unconditionally when adaptive_intensity is enabled. To opt out of
scaling entirely, set adaptive_intensity = false (falls back to Lite).
PR2 of two. Stacks on the context engine from PR1 and lands the user-facing
features that make squeez attack input + output + memory-file tokens at
maximum aggression by default.

Highlights:

- squeez compress-md — pure-Rust, zero-LLM caveman-style markdown
  compressor. Line-based state machine preserves fenced code blocks,
  inline code, URLs (bare + markdown link targets), headings, tables,
  list markers, and version strings; only natural-language prose is
  compressed. Drops articles + fillers + pleasantries + hedging
  everywhere. --ultra additionally substitutes with→w/, function→fn,
  configuration→config, repository→repo, etc. Damage heuristic aborts
  the write if any code block, URL, or heading was lost. Backups go
  to <stem>.original.md and are never overwritten.

- Caveman persona — three intensity levels (lite/full/ultra) shipped
  via include_str!. Default is Ultra. Injected into the squeez init
  banner (Claude Code) and into the <!-- squeez:start --> block in
  ~/.copilot/copilot-instructions.md (Copilot CLI).

- auto_compress_md=true — squeez init runs compress-md --all silently
  on every session start. Idempotent: backup is only written once and
  the integrity check guarantees safety.

- squeez update — self-updater. Shells out to curl + sha256sum/shasum
  (both already required by install.sh) so we stay zero-dep. Atomic
  on Unix via .new + rename; documented .new + manual move on Windows.
  Override base URL via SQUEEZ_UPDATE_URL_OVERRIDE for tests.
  --check reports without installing; --insecure skips checksum.

- squeez track-result + posttooluse hook update — PostToolUse now
  pipes the raw JSON for every tool call (Read, Grep, LS, Glob, ...)
  into squeez track-result, which extracts file paths + errors from
  the result content and feeds them into SessionContext. Future bash
  calls dedup against this state via the cross-call hint added in PR1.

- Config additions: persona (default Ultra), auto_compress_md (default
  true). Both opt-out via config.ini.

Tests: 230 passing (was 162). New integration tests for compress_md
(13), persona (8), track_result (6), update (8). New mdcompress_*
bench fixtures show ~22-27% reduction (caveman LLM gets ~45% with API
calls; we get ~25% with zero API calls in <5ms).
feat(context): aggressive token-engine — adaptive intensity, redundancy cache, summarize fallback
…update

feat(pr2): compress-md, caveman persona, squeez update, track-result hook
- Updated "What it does" with context engine, persona, compress-md, track-result
- Added summarize_huge and intensity_budget80 rows to benchmark table (14/14 pass)
- Updated changelog with 2026-04-07 entries for all new features
- Updated PostToolUse description to reflect full artifact extraction
…ken budget

- jq/yq/terraform/tofu/helm/pulumi: DataToolHandler (terraform: strip unchanged
  resources, keep +/-/~ and Plan: lines; helm: strip boilerplate; jq/yq: drop
  empty/null lines + truncate)
- grep/rg/awk/sed: TextProcHandler (collapse files with ≥5 matches to summary
  line; dedup + truncate)
- podman, nextest, az: added to filter.rs routing table
- RECENT_WINDOW: 8→16 (catch repeated outputs in longer call sequences)
- MIN_LINES: 5→2 (short outputs like `git status` 2-liner now eligible for dedup)
- compact_threshold_tokens default: 160K→120K (50K safety margin for sonnet-4-6
  200K context; avoids truncation from protocol overhead + system prompt)
- Per-tool token breakdown in compact warning: Bash/Read/Other separate counters
  persisted to context.json via SessionContext.note_tool_tokens()
- README: deduplicated benchmark header, updated config defaults, handler table,
  redundancy window and compact warning documentation
fix: resolve gap analysis — new handlers, wider redundancy window, token budget
@claudioemmanuel
Copy link
Copy Markdown
Owner Author

Superseded — promoting develop directly with all gap fixes included.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants