diff --git a/RUST_DEV.md b/RUST_DEV.md index 614ffacc..135b9a7f 100644 --- a/RUST_DEV.md +++ b/RUST_DEV.md @@ -159,6 +159,37 @@ The recorder monkey-patches the in-process transform classes (see `record_all()` in `tests/parity/recorder.py`). It does **not** modify any file under `headroom/`. +## Known regressions in retired-Python components + +The Stage 3b/3c.1b retirements deleted Python source for `DiffCompressor` +and `SmartCrusher` and replaced them with PyO3-delegating shims. The +2026-04-28 audit found that the retirements shipped with subsystems +silently disconnected. This section tracks each gap and its disposition +so they don't regress further or get forgotten. + +### SmartCrusher + +| Subsystem | State | Tracked by | +|---|---|---| +| TOIN learning loop | **Re-attached 2026-04-28.** Shim's `crush()` and `_smart_crush_content()` now call `toin.record_compression()` after a real compression. Filtered on `strategy != "passthrough"` to ignore JSON re-canonicalization. Best-effort: TOIN failures are logged at debug level and don't break compression. | `tests/test_smart_crusher_toin_attachment.py` | +| CCR marker emission knob | **Open gap.** `ccr_config.inject_retrieval_marker=False` is not honored — the Rust port emits `<>` markers as part of `dropped_summary` unconditionally. Shim now logs a WARNING when callers pass `False`. **Fix needed:** add a `enable_ccr_marker: bool` gate in `crates/headroom-core/src/transforms/smart_crusher/crusher.rs::crush_array`, plumb through `SmartCrusherConfig` and the PyO3 bridge. | This file + warning at `headroom/transforms/smart_crusher.py` | +| Custom relevance scorer | **Open gap.** `relevance_config` and `scorer` constructor args are accepted for source compatibility but the Rust default `HybridScorer` always runs. Shim now logs WARNING (previously debug). **Fix needed:** expose a Python-bridged scorer constructor surface from `crates/headroom-core/src/relevance/`. | Warning at `headroom/transforms/smart_crusher.py` | +| Per-tool TOIN learning hook | **Re-attached partially.** `_smart_crush_content` accepts `tool_name` and now threads it into the TOIN record. The hook is best-effort — it improves `query_context` aggregation but doesn't drive per-tool overrides yet. | `tests/test_smart_crusher_toin_attachment.py::test_smart_crush_content_records_to_toin` | + +### DiffCompressor + +| Subsystem | State | +|---|---| +| Adaptive context windows | Honored byte-for-byte (parity fixture-locked). | +| TOIN integration | Never had one — DiffCompressor records via `_record_to_toin` in ContentRouter, which already runs for non-SmartCrusher strategies. No regression. | + +### Watch list (potential regressions, not yet audited) + +- `CCRConfig.enabled=False` end-to-end behavior. Currently the Rust port has a CCR store that's controlled by builder selection, not by a config flag. Sometimes-disabled paths haven't been audited. +- `SmartCrusherConfig.use_feedback_hints=False` — config field is forwarded to Rust but its honoring inside the Rust crusher hasn't been verified against a parity fixture for the disabled path. + +When any item above changes, update both this section and the test file. The shim's docstring also references this section — keep them aligned. + ## Phase 0 Blockers These are known limitations for Phase 0. They are tracked here so Phase 1 diff --git a/headroom/transforms/smart_crusher.py b/headroom/transforms/smart_crusher.py index 1ae5ac5f..66dddaac 100644 --- a/headroom/transforms/smart_crusher.py +++ b/headroom/transforms/smart_crusher.py @@ -18,18 +18,28 @@ fallback. Build it locally with `scripts/build_rust_extension.sh` (wraps `maturin develop`) or install a prebuilt wheel. -Stage 3c.1 deliberately keeps the optional subsystems (TOIN, -feedback, CCR marker injection, telemetry) disabled in the Rust port. -The shim accepts `relevance_config`, `scorer`, and `ccr_config` -constructor args for source compatibility but does not wire them -through — they re-attach in Stage 3c.2 when those subsystems land in -Rust. CCR marker injection in `_smart_crush_content` is a Stage 3c.2 -follow-up; today the Rust port never emits CCR markers, so the -disabled-path behavior is byte-equal. +# Functionality state (post-audit, 2026-04-28) + +- **TOIN learning** — re-attached. `crush()` and `_smart_crush_content` + call `toin.record_compression()` after a real compression (filtered on + `strategy != "passthrough"` to ignore JSON re-canonicalization). + The retired Python class did this inline; the bridge keeps the + highest-traffic strategy fueling the learning loop. +- **CCR marker emission** — partial gap. The Rust port emits + `<>` markers unconditionally as part of + `dropped_summary`; the `ccr_config.inject_retrieval_marker=False` + knob is therefore not honored end-to-end. The shim now logs a + WARNING when callers pass `False`. Suppression needs a Rust-side + gate; see RUST_DEV.md. +- **Custom relevance scorer / scorer override** — still not wired. + `relevance_config` and `scorer` constructor args are accepted for + source compat but the Rust default `HybridScorer` is always used. + The shim now logs a WARNING (was: debug) when these are passed. """ from __future__ import annotations +import json import logging from dataclasses import dataclass from typing import Any @@ -159,14 +169,26 @@ def __init__( self._with_compaction = with_compaction # CCR config is preserved on `self` for callers that read it - # back (`headroom.proxy.server` does), but the Rust port doesn't - # exercise it: Stage 3c.1 keeps CCR marker injection disabled - # because the Rust port has no compression store. When Stage - # 3c.2 lands the CCR port, this wires through to Rust. + # back (`headroom.proxy.server` does). Storage-side semantics + # (CCR cache lookups) are honored via the Rust crusher's own + # CCR store. *Marker emission* is the gap: the Rust port emits + # `<>` markers unconditionally as + # part of `dropped_summary`, so `inject_retrieval_marker=False` + # has no effect on the prompt today. We log a WARNING (not + # debug) so it's visible. Suppression of marker emission needs + # a Rust-side gate; tracked in RUST_DEV.md. if ccr_config is None: self._ccr_config = CCRConfig(enabled=True, inject_retrieval_marker=False) else: self._ccr_config = ccr_config + if not self._ccr_config.inject_retrieval_marker: + logger.warning( + "SmartCrusher: ccr_config.inject_retrieval_marker=False " + "is currently NOT honored by the Rust port — CCR row-drop " + "markers (`<>`) will still appear " + "in compressed output. Tracked as a known regression in " + "RUST_DEV.md until a Rust-side gate lands." + ) # `relevance_config` and `scorer` are accepted for source # compatibility but currently dropped — Stage 3c.1 ships with @@ -174,12 +196,18 @@ def __init__( # Stage 3c.2 when the relevance crate gains a Python-bridged # constructor surface. if relevance_config is not None or scorer is not None: - logger.debug( - "SmartCrusher: relevance_config/scorer args are ignored in " - "Stage 3c.1 (Rust port uses default HybridScorer). They " - "will be wired through in Stage 3c.2." + logger.warning( + "SmartCrusher: custom relevance_config/scorer args are " + "currently ignored (Rust port uses default HybridScorer). " + "Tracked as a known regression in RUST_DEV.md." ) + # Lazy TOIN handle. Loaded on first compression that has items + # to learn from. Skipping import at __init__ keeps cold-start + # fast for environments where telemetry is disabled. + self._toin: Any = None + self._toin_load_failed = False + # Build the Rust crusher with every field from the Python # config, plus the relevance_threshold default (0.3) — the # Python dataclass doesn't carry that field; it lives on @@ -221,6 +249,26 @@ def crush(self, content: str, query: str = "", bias: float = 1.0) -> CrushResult working. """ r = self._rust.crush(content, query, bias) + # Re-attach the TOIN learning loop. The retired Python class + # recorded compressions into TOIN inline; the Rust port doesn't + # know about TOIN, and `ContentRouter._record_to_toin` skips + # SmartCrusher on the assumption SmartCrusher records its own. + # Bridging the gap here keeps JSON-array compressions fueling + # the learning system. + # + # Filter on `was_modified AND strategy != "passthrough"`. The + # Rust crusher sometimes flips `was_modified=True` from pure + # JSON re-canonicalization (whitespace normalization) without + # actually compressing — the strategy stays `"passthrough"` in + # that case, and there's no learning value in recording it. + if r.was_modified and r.strategy != "passthrough": + self._record_to_toin( + original=content, + compressed=r.compressed, + strategy=r.strategy, + query_context=query, + tool_name=None, + ) return CrushResult( compressed=r.compressed, original=r.original, @@ -290,12 +338,103 @@ def _smart_crush_content( """Apply smart crushing; return `(crushed, was_modified, info)`. Mirrors the retired Python method's tuple shape. `tool_name` is - accepted for API compatibility and currently ignored — Stage - 3c.1 has no per-tool TOIN learning hook. + threaded through to TOIN's per-tool learning records; if no + tool name is available (e.g. the legacy pipeline doesn't have + one in scope) the recording uses content-based signature only. """ crushed, was_modified, info = self._rust.smart_crush_content(content, query_context, bias) + # Same passthrough filter as `crush()` — re-canonicalization of + # JSON whitespace can flip `was_modified=True` even when the + # `info` field reports `passthrough` and no compression happened. + if was_modified and info != "passthrough": + self._record_to_toin( + original=content, + compressed=crushed, + strategy=info or "smart_crusher", + query_context=query_context, + tool_name=tool_name, + ) return crushed, was_modified, info + def _record_to_toin( + self, + original: str, + compressed: str, + strategy: str, + query_context: str, + tool_name: str | None, + ) -> None: + """Record a successful compression into TOIN's learning store. + + Replaces the inline TOIN call the retired Python SmartCrusher + had at the end of its compression path. Best-effort: TOIN + failures are logged at debug level and never bubble — the + compression itself has already happened and is correct. + + Token estimates use the `len(json) // 4` rule the retired + implementation used. The router doesn't pass a tokenizer down + this far, and re-tokenizing here would dominate the recording + cost. Rough estimates are fine for learning aggregates. + """ + if self._toin_load_failed: + return + try: + try: + items = json.loads(original) + except (json.JSONDecodeError, ValueError): + # Not JSON — nothing structural for TOIN to learn from + # at the array level. The Rust crusher only sets + # `was_modified=True` on JSON-array inputs, so this + # branch is rare; bail quietly. + return + if not isinstance(items, list): + return + + from ..telemetry.models import ToolSignature + + signature = ToolSignature.from_items(items) + original_tokens = max(1, len(original) // 4) + compressed_tokens = max(1, len(compressed) // 4) + + if self._toin is None: + from ..telemetry.toin import get_toin + + self._toin = get_toin() + + # Extract the kept-row count from the compressed payload + # when possible. The lossy path emits a JSON array with a + # `_ccr_dropped` sentinel suffix; the lossless path emits + # CSV-schema or compact JSON. For the array case we get an + # exact compressed_count; otherwise fall back to the rough + # `original_count` (TOIN cares more about structural + # signature than count precision). + try: + compressed_parsed = json.loads(compressed) + compressed_count = ( + len(strip_ccr_sentinels(compressed_parsed)) + if isinstance(compressed_parsed, list) + else len(items) + ) + except (json.JSONDecodeError, ValueError): + compressed_count = len(items) + + self._toin.record_compression( + tool_signature=signature, + original_count=len(items), + compressed_count=compressed_count, + original_tokens=original_tokens, + compressed_tokens=compressed_tokens, + strategy=strategy, + query_context=query_context if query_context else None, + items=items[:5], # Sample for field-level learning + ) + except ImportError: + # TOIN module not installed in this build — disable for + # the lifetime of this crusher to avoid retry overhead. + self._toin_load_failed = True + except Exception as e: # pragma: no cover - best effort + logger.debug("SmartCrusher TOIN recording failed (non-fatal): %s", e) + def _extract_context_from_messages(self, messages: list[dict[str, Any]]) -> str: """Build a query string from the last 5 user messages + recent assistant tool-call arguments. Used by `apply()` to derive the diff --git a/tests/test_smart_crusher_toin_attachment.py b/tests/test_smart_crusher_toin_attachment.py new file mode 100644 index 00000000..25e79726 --- /dev/null +++ b/tests/test_smart_crusher_toin_attachment.py @@ -0,0 +1,207 @@ +"""TOIN re-attachment regression tests for the Rust-backed SmartCrusher. + +When SmartCrusher's Python implementation was retired in Stage 3c.1b, +its inline `toin.record_compression()` call was lost. The Rust port +doesn't know about TOIN, and `ContentRouter._record_to_toin` skips +SmartCrusher on the assumption SmartCrusher records its own. The net +result was a silent regression: JSON-array compressions — the +highest-traffic strategy — stopped feeding the learning loop. + +The shim now bridges the gap. These tests assert that: + +1. Calling `SmartCrusher.crush(...)` on a JSON array of dicts produces a + `record_compression` event in TOIN when the input is large enough to + actually compress. +2. Pass-through inputs (no modification) do NOT record — TOIN only + learns from real compression events. +3. The recorded `tool_signature.structure_hash` is computed over the + parsed items, so two compressions of structurally-similar inputs + land on the same pattern. +4. Calling `_smart_crush_content(...)` (the legacy `apply()` path) also + records. +5. TOIN failures are non-fatal — compression completes even if the + telemetry import or call raises. +""" + +from __future__ import annotations + +import tempfile +from pathlib import Path +from typing import Any +from unittest.mock import patch + +import pytest + +from headroom.telemetry.toin import TOINConfig, get_toin, reset_toin +from headroom.transforms.smart_crusher import SmartCrusher, SmartCrusherConfig + + +@pytest.fixture +def fresh_toin(): + reset_toin() + with tempfile.TemporaryDirectory() as tmpdir: + storage = str(Path(tmpdir) / "toin.json") + toin = get_toin( + TOINConfig( + storage_path=storage, + auto_save_interval=0, + ) + ) + yield toin + reset_toin() + + +def _bigger_array(n: int = 60) -> str: + """JSON array of `n` dicts, large enough to trigger crushing. + + Items use a low-uniqueness shape (`{"status": "ok", "tag": "x"}`) + so the analyzer recommends compaction or row drops. We need at + least 200 tokens (`min_tokens_to_crush` default) to enter the + crusher; 60 items keeps us well above that. + """ + items = [{"status": "ok", "tag": "x", "n": i} for i in range(n)] + import json as _json + + return _json.dumps(items) + + +# ─── crush() path (router-style) ─────────────────────────────────────── + + +def test_crush_records_to_toin_on_modification(fresh_toin): + crusher = SmartCrusher(SmartCrusherConfig()) + payload = _bigger_array(60) + pre = sum(p.total_compressions for p in fresh_toin._patterns.values()) + + result = crusher.crush(payload, query="test query", bias=1.0) + + if not result.was_modified: + pytest.skip("payload didn't trigger compression — bump the size") + post = sum(p.total_compressions for p in fresh_toin._patterns.values()) + assert post > pre, "TOIN should have recorded a compression event" + + +def test_crush_does_not_record_on_passthrough(fresh_toin): + """A small input the analyzer doesn't compress should produce no + TOIN recording, even if the Rust port flipped `was_modified=True` + from JSON whitespace re-canonicalization. The strategy stays + `passthrough` in that case and we filter on it.""" + crusher = SmartCrusher(SmartCrusherConfig()) + payload = '[{"id": 1}]' # Single item — below min_items_to_analyze. + pre = sum(p.total_compressions for p in fresh_toin._patterns.values()) + + result = crusher.crush(payload, query="", bias=1.0) + assert result.strategy == "passthrough" + post = sum(p.total_compressions for p in fresh_toin._patterns.values()) + assert post == pre, "no recording when strategy is passthrough" + + +def test_crush_signature_groups_similar_inputs(fresh_toin): + """Two structurally-similar payloads should record under the same + tool signature so TOIN can aggregate the pattern across calls.""" + crusher = SmartCrusher(SmartCrusherConfig()) + + payload_a = _bigger_array(60) + payload_b = _bigger_array(80) + + crusher.crush(payload_a, query="", bias=1.0) + crusher.crush(payload_b, query="", bias=1.0) + + if not fresh_toin._patterns: + pytest.skip("neither payload compressed — bump the size") + # Both share field shape {status, tag, n} → same structure hash → + # one pattern with at least 2 recordings. + pattern_counts = {h: p.total_compressions for h, p in fresh_toin._patterns.items()} + assert max(pattern_counts.values()) >= 2, ( + f"expected the same pattern to be recorded twice, got {pattern_counts}" + ) + + +# ─── _smart_crush_content() path (legacy apply()) ────────────────────── + + +def test_smart_crush_content_records_to_toin(fresh_toin): + crusher = SmartCrusher(SmartCrusherConfig()) + payload = _bigger_array(60) + pre = sum(p.total_compressions for p in fresh_toin._patterns.values()) + + crushed, was_modified, _info = crusher._smart_crush_content( + payload, query_context="user query", tool_name="get_records", bias=1.0 + ) + + if not was_modified: + pytest.skip("payload didn't trigger compression") + post = sum(p.total_compressions for p in fresh_toin._patterns.values()) + assert post > pre + + +# ─── Failure modes ───────────────────────────────────────────────────── + + +def test_toin_failure_does_not_break_compression(fresh_toin): + """If TOIN's `record_compression` raises, compression still + completes and returns a valid CrushResult. Telemetry is best- + effort; the request must not fail.""" + crusher = SmartCrusher(SmartCrusherConfig()) + payload = _bigger_array(60) + + def _boom(*_a: Any, **_kw: Any) -> None: + raise RuntimeError("simulated TOIN backend down") + + with patch.object(fresh_toin, "record_compression", side_effect=_boom): + result = crusher.crush(payload, query="", bias=1.0) + assert isinstance(result.compressed, str) + # Crusher itself should still report a result regardless of + # whether the underlying analysis chose to modify. + + +def test_non_json_input_does_not_record(fresh_toin): + """Non-JSON input shouldn't blow up — it just doesn't record. + The Rust crusher returns `was_modified=False` for non-arrays; + even if it didn't, the helper guards against `json.loads` + failure.""" + crusher = SmartCrusher(SmartCrusherConfig()) + pre = sum(p.total_compressions for p in fresh_toin._patterns.values()) + + result = crusher.crush("not json at all", query="", bias=1.0) + assert result.was_modified is False # passthrough + post = sum(p.total_compressions for p in fresh_toin._patterns.values()) + assert post == pre + + +# ─── CCR marker knob ─────────────────────────────────────────────────── + + +def test_ccr_inject_marker_false_logs_warning(monkeypatch): + """Until the Rust port honors the marker-suppression flag, we want a + visible warning when callers ask for it. This guards against the + silent regression that started this audit. + + Why monkeypatch instead of `caplog`: `caplog` flakes under the full + suite — earlier tests can leave `logging.disable()` or root-handler + state that suppresses the named logger's records before the + test's filter sees them. Patching `logger.warning` on the module + bypasses every level/disable/handler concern; we only assert that + the constructor *called* `logger.warning` with the flag name in + the message. + """ + from headroom.config import CCRConfig + from headroom.transforms import smart_crusher as sc_module + + captured: list[str] = [] + real_warning = sc_module.logger.warning + + def _capture(msg: str, *args: object, **kwargs: object) -> None: + captured.append(msg % args if args else msg) + real_warning(msg, *args, **kwargs) + + monkeypatch.setattr(sc_module.logger, "warning", _capture) + + SmartCrusher( + SmartCrusherConfig(), + ccr_config=CCRConfig(enabled=True, inject_retrieval_marker=False), + ) + + assert any("inject_retrieval_marker=False" in m for m in captured), ( + "expected a WARNING about the unsupported marker-suppression flag" + )