Skip to content

feat(rust): SmartCrusher extension surface — Constraint, Observer, Builder#284

Merged
chopratejas merged 1 commit intomainfrom
rust-stage-3c-2-pr1-traits
Apr 27, 2026
Merged

feat(rust): SmartCrusher extension surface — Constraint, Observer, Builder#284
chopratejas merged 1 commit intomainfrom
rust-stage-3c-2-pr1-traits

Conversation

@chopratejas
Copy link
Copy Markdown
Owner

Summary

First PR in Stage 3c.2 (architectural refactor of Rust SmartCrusher). Introduces a minimal, elegant extension surface that lets OSS users override behavior and unblocks the Enterprise crate without boiling the ocean.

Three traits, one builder. No behavior change for existing call sites.

What changed

  • New module: crates/headroom-core/src/transforms/smart_crusher/traits.rs — defines Constraint, Observer, CrushEvent, and re-exports the existing RelevanceScorer as Scorer.
  • New module: constraints.rsKeepErrorsConstraint, KeepStructuralOutliersConstraint, plus default_oss_constraints() factory.
  • New module: observer.rsTracingObserver that emits tracing::debug! events on every crush.
  • New module: builder.rsSmartCrusherBuilder with with_scorer, add_constraint, add_default_oss_constraints, add_observer, with_default_oss_setup, and build().
  • crusher.rsSmartCrusher now holds constraints: Vec<Box<dyn Constraint>> and observers: Vec<Box<dyn Observer>>. SmartCrusher::new(config) is now SmartCrusherBuilder::new(config).with_default_oss_setup().build() (zero behavior change). Crush calls fire observer events.
  • planning.rsSmartCrusherPlanner takes &[Box<dyn Constraint>]. The 4 hardcoded detect_* call sites are replaced by apply_constraints(...). Pure refactor, identical preserved-index sets.

Why this shape

  • Three traits, not eight. Constraint (must-keep indices), Observer (events), Scorer (already exists). Everything else (TabularCompactor, allocator, formatter) lands in later PRs once we're sure of the seams.
  • Empty default builder. SmartCrusherBuilder::new(config) ships nothing. with_default_oss_setup() is the explicit OSS preset. No silent fallbacks — empty composition is honest.
  • Box not generics. Avoids monomorphization explosion across Enterprise plugins; runtime cost is negligible compared to the actual crushing work.
  • Enterprise unblocked. ENT-A (next PR's PR-5) can land BusinessRuleConstraint, AuditObserver, LoopScorer without touching headroom-core at all.

Test plan

  • 403 unit tests pass (was 388 — 15 new for builder/constraints/observer)
  • 17/17 SmartCrusher parity fixtures byte-equal
  • 185 Python tests pass via PyO3 bridge (no behavior change)
  • make ci-precheck green: ruff, mypy, cargo fmt/clippy/test (1.95.0), commitlint
  • CI green on PR (now includes docker hotfix from ci(docker): fix Argument list too long when signing bake outputs #283)

…ilder

Stage 3c.2 PR1 — the public extension surface that lets Enterprise
crates plug richer components into SmartCrusher without forking. Three
traits, one builder, behavior-equivalent on every parity fixture.

The three traits:

- Scorer (re-exported from `crate::relevance::RelevanceScorer`).
  Already a trait; OSS HybridScorer (BM25 + fastembed). Enterprise
  point: per-tenant Loop-trained scorer.

- Constraint (new in `traits.rs`). `must_keep(items, item_strings)
  -> Vec<usize>` — indices the allocator must keep regardless of
  saliency. OSS defaults: `KeepErrorsConstraint`,
  `KeepStructuralOutliersConstraint` — thin wrappers around the
  existing `detect_error_items_for_preservation` and
  `detect_structural_outliers` functions. Enterprise point:
  BusinessRuleConstraint, RegulatoryConstraint::HIPAA, and so on.

- Observer (new in `traits.rs`). `on_event(&CrushEvent)` fires once
  per top-level `crush()` call with strategy + sizes + elapsed_ns.
  OSS default: TracingObserver — writes to the `tracing` crate at
  debug, zero-cost when filtered out. Enterprise point:
  AuditObserver, MetricsObserver, LoopTrainingObserver.

The builder (`builder.rs`):

`SmartCrusherBuilder::new(config)` starts EMPTY (no scorer, no
constraints, no observers — explicit composition; "no silent
fallbacks" applied to the API surface). Methods stack:
with_scorer, add_constraint, add_default_oss_constraints (appends
KeepErrors + KeepStructuralOutliers), add_observer,
with_default_oss_setup (HybridScorer + default constraints +
TracingObserver in one call).

`SmartCrusher::new(config)` is preserved as the OSS default factory
(equivalent to `SmartCrusher::builder(config).with_default_oss_setup
.build()`). Every existing caller (proxy, content_router,
integrations, evals) continues to work unchanged.

Internal refactor:

`SmartCrusherPlanner` now holds `&[Box<dyn Constraint>]` and
iterates the configured constraints via a new
`apply_constraints(items, item_strings, keep)` method. Replaces four
hardcoded `detect_structural_outliers` +
`detect_error_items_for_preservation` call sites in the four plan
methods. With the OSS default constraint stack the must-keep set is
byte-identical to pre-PR1 — verified by all 17 parity fixtures.

`SmartCrusher` gained two fields: `constraints: Vec<Box<dyn
Constraint>>` and `observers: Vec<Box<dyn Observer>>`. New
`from_parts` constructor (#[doc(hidden)]) is the builder's exit
point.

What did NOT change in this PR:

- The internal planning algorithm (lossless tabular, saliency
  scoring, structured markers — those are PR 2/3/4).
- The string/number/object/mixed-array crusher paths in
  `crushers.rs` and the `prioritize_indices` helper in
  `orchestration.rs` — they still call the detection functions
  directly. Path B from the design doc: dict-array path is the
  primary value plugin point; lifting the leaf compressors can come
  later if customers ask.

Tests:

15 new tests across `traits.rs`, `constraints.rs`, `observer.rs`,
`builder.rs`. Coverage: each constraint trait method called and
pinned (errors flagged, structural outliers detected, item_strings
cache parity, empty-array safety); builder empty-build path,
default-OSS-stack append, add_constraint order preservation,
with_default_oss_setup yields expected counts, observer fires
end-to-end on a real crush; TracingObserver name stable, on_event
doesn't panic.

Verification:
- cargo test --workspace: 403 passed (was 388, +15 new), 0 failed.
- parity: 17/17 byte-equal for smart_crusher.
- make ci-precheck: green.

Stage 3c.2 PR sequence:
- PR 1 (this commit): three traits + builder.
- PR 2 (next): improvement A — TabularCompactor.
- PR 3: improvement B — saliency scoring + structured allocator.
- PR 4: improvement C — structured marker formatter.
- PR 5: ENT-A — `headroom-enterprise` scaffold.
@chopratejas chopratejas merged commit 5b9cd5f into main Apr 27, 2026
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant