feat(rust): SmartCrusher extension surface — Constraint, Observer, Builder#284
Merged
chopratejas merged 1 commit intomainfrom Apr 27, 2026
Merged
feat(rust): SmartCrusher extension surface — Constraint, Observer, Builder#284chopratejas merged 1 commit intomainfrom
chopratejas merged 1 commit intomainfrom
Conversation
…ilder Stage 3c.2 PR1 — the public extension surface that lets Enterprise crates plug richer components into SmartCrusher without forking. Three traits, one builder, behavior-equivalent on every parity fixture. The three traits: - Scorer (re-exported from `crate::relevance::RelevanceScorer`). Already a trait; OSS HybridScorer (BM25 + fastembed). Enterprise point: per-tenant Loop-trained scorer. - Constraint (new in `traits.rs`). `must_keep(items, item_strings) -> Vec<usize>` — indices the allocator must keep regardless of saliency. OSS defaults: `KeepErrorsConstraint`, `KeepStructuralOutliersConstraint` — thin wrappers around the existing `detect_error_items_for_preservation` and `detect_structural_outliers` functions. Enterprise point: BusinessRuleConstraint, RegulatoryConstraint::HIPAA, and so on. - Observer (new in `traits.rs`). `on_event(&CrushEvent)` fires once per top-level `crush()` call with strategy + sizes + elapsed_ns. OSS default: TracingObserver — writes to the `tracing` crate at debug, zero-cost when filtered out. Enterprise point: AuditObserver, MetricsObserver, LoopTrainingObserver. The builder (`builder.rs`): `SmartCrusherBuilder::new(config)` starts EMPTY (no scorer, no constraints, no observers — explicit composition; "no silent fallbacks" applied to the API surface). Methods stack: with_scorer, add_constraint, add_default_oss_constraints (appends KeepErrors + KeepStructuralOutliers), add_observer, with_default_oss_setup (HybridScorer + default constraints + TracingObserver in one call). `SmartCrusher::new(config)` is preserved as the OSS default factory (equivalent to `SmartCrusher::builder(config).with_default_oss_setup .build()`). Every existing caller (proxy, content_router, integrations, evals) continues to work unchanged. Internal refactor: `SmartCrusherPlanner` now holds `&[Box<dyn Constraint>]` and iterates the configured constraints via a new `apply_constraints(items, item_strings, keep)` method. Replaces four hardcoded `detect_structural_outliers` + `detect_error_items_for_preservation` call sites in the four plan methods. With the OSS default constraint stack the must-keep set is byte-identical to pre-PR1 — verified by all 17 parity fixtures. `SmartCrusher` gained two fields: `constraints: Vec<Box<dyn Constraint>>` and `observers: Vec<Box<dyn Observer>>`. New `from_parts` constructor (#[doc(hidden)]) is the builder's exit point. What did NOT change in this PR: - The internal planning algorithm (lossless tabular, saliency scoring, structured markers — those are PR 2/3/4). - The string/number/object/mixed-array crusher paths in `crushers.rs` and the `prioritize_indices` helper in `orchestration.rs` — they still call the detection functions directly. Path B from the design doc: dict-array path is the primary value plugin point; lifting the leaf compressors can come later if customers ask. Tests: 15 new tests across `traits.rs`, `constraints.rs`, `observer.rs`, `builder.rs`. Coverage: each constraint trait method called and pinned (errors flagged, structural outliers detected, item_strings cache parity, empty-array safety); builder empty-build path, default-OSS-stack append, add_constraint order preservation, with_default_oss_setup yields expected counts, observer fires end-to-end on a real crush; TracingObserver name stable, on_event doesn't panic. Verification: - cargo test --workspace: 403 passed (was 388, +15 new), 0 failed. - parity: 17/17 byte-equal for smart_crusher. - make ci-precheck: green. Stage 3c.2 PR sequence: - PR 1 (this commit): three traits + builder. - PR 2 (next): improvement A — TabularCompactor. - PR 3: improvement B — saliency scoring + structured allocator. - PR 4: improvement C — structured marker formatter. - PR 5: ENT-A — `headroom-enterprise` scaffold.
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
First PR in Stage 3c.2 (architectural refactor of Rust SmartCrusher). Introduces a minimal, elegant extension surface that lets OSS users override behavior and unblocks the Enterprise crate without boiling the ocean.
Three traits, one builder. No behavior change for existing call sites.
What changed
crates/headroom-core/src/transforms/smart_crusher/traits.rs— definesConstraint,Observer,CrushEvent, and re-exports the existingRelevanceScorerasScorer.constraints.rs—KeepErrorsConstraint,KeepStructuralOutliersConstraint, plusdefault_oss_constraints()factory.observer.rs—TracingObserverthat emitstracing::debug!events on every crush.builder.rs—SmartCrusherBuilderwithwith_scorer,add_constraint,add_default_oss_constraints,add_observer,with_default_oss_setup, andbuild().crusher.rs—SmartCrushernow holdsconstraints: Vec<Box<dyn Constraint>>andobservers: Vec<Box<dyn Observer>>.SmartCrusher::new(config)is nowSmartCrusherBuilder::new(config).with_default_oss_setup().build()(zero behavior change). Crush calls fire observer events.planning.rs—SmartCrusherPlannertakes&[Box<dyn Constraint>]. The 4 hardcodeddetect_*call sites are replaced byapply_constraints(...). Pure refactor, identical preserved-index sets.Why this shape
Constraint(must-keep indices),Observer(events),Scorer(already exists). Everything else (TabularCompactor, allocator, formatter) lands in later PRs once we're sure of the seams.SmartCrusherBuilder::new(config)ships nothing.with_default_oss_setup()is the explicit OSS preset. No silent fallbacks — empty composition is honest.BusinessRuleConstraint,AuditObserver,LoopScorerwithout touchingheadroom-coreat all.Test plan
make ci-precheckgreen: ruff, mypy, cargo fmt/clippy/test (1.95.0), commitlint