Skip to content

fix(proxy): guard CCR tool injection against frozen prefix to preserve cache#298

Merged
chopratejas merged 1 commit intochopratejas:mainfrom
SwiftWing21:fix/ccr-tool-injection-frozen-guard
Apr 28, 2026
Merged

fix(proxy): guard CCR tool injection against frozen prefix to preserve cache#298
chopratejas merged 1 commit intochopratejas:mainfrom
SwiftWing21:fix/ccr-tool-injection-frozen-guard

Conversation

@SwiftWing21
Copy link
Copy Markdown
Contributor

Summary

Fixes #294 — the CCR tool injection path had no frozen_message_count guard, so the first Kompress call in a session mutated the tools array and busted Anthropic's prefix cache (dropping cache_read_input_tokens from ~48K → 0).

The system instruction injection path already had this guard at proxy/handlers/anthropic.py:963-968. This PR mirrors it for inject_tool — the simplest version of the fix suggested in the issue.

Change

headroom/proxy/handlers/anthropic.py — when frozen_message_count > 0, set inject_tool=False and log a deferral message (matching the existing system-instruction guard's tone). The injector is still constructed (it may still need to scan for compressed content), but it will not mutate the tools array.

inject_tool = self.config.ccr_inject_tool
if inject_tool and frozen_message_count > 0:
    logger.info(
        f"[{request_id}] CCR: deferring tool injection "
        f"(frozen prefix={frozen_message_count}) to preserve cache"
    )
    inject_tool = False

Test

Adds test_ccr_tool_injection_disabled_when_prefix_frozen in tests/test_proxy_anthropic_cache_stability.py — a direct companion to the existing test_ccr_system_instruction_injection_disabled_when_prefix_frozen, using the same _FakePrefixTracker + monkeypatched CCRToolInjector pattern. Asserts the injector receives inject_tool=False when frozen_count=1.

Test plan

  • ruff check on modified files — clean
  • ruff format --check on modified files — clean
  • Full test_proxy_anthropic_cache_stability.py — 17/17 pass (including the new test)
  • Broader CCR suites (test_ccr*, test_proxy_ccr) — 140 pass; the only failures are pre-existing headroom._core Rust-extension import errors unrelated to this path
  • CI: cargo fmt / clippy / Rust workspace tests (not run locally — no Rust toolchain on dev box; this PR touches only Python)

Notes

The issue suggests a more complete variant that defers tool injection until the next call where frozen_message_count == 0 (i.e. injects "for free" during a TTL re-warm). That's a follow-up — this PR ships the simple, symmetric fix that resolves the cache bust. Happy to extend if maintainers prefer the deferral approach.

🤖 Generated with Claude Code

…e cache

The Anthropic handler's CCR injector path applied a frozen_message_count
guard to system instruction injection but not to tool injection. When
Kompress fired for the first time in a session, the tools array was
mutated unconditionally, invalidating Anthropic's prefix cache and
dropping cache_read_input_tokens to zero on calls where ~48K tokens
were previously being cached.

Mirror the existing inject_system_instructions guard for inject_tool:
when frozen_message_count > 0, defer tool injection so the warm prefix
stays intact.

Adds test_ccr_tool_injection_disabled_when_prefix_frozen as a direct
companion to the existing test_ccr_system_instruction_injection_
disabled_when_prefix_frozen.

Fixes chopratejas#294

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 28, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@chopratejas
Copy link
Copy Markdown
Owner

Thank you for fixing this! Appreciate it - looking forward to the follow up PR

@chopratejas chopratejas merged commit 9d5f15b into chopratejas:main Apr 28, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CCR tool injection has no frozen-prefix guard, causing full Anthropic cache busts on first Kompress call

2 participants