Purpose: Regression-test Codemap internals by comparing codemap query output to checked-in expectations (or subset matchers) on fixed corpora — not an LLM-in-the-loop eval. Latency / tokens vs scanning: benchmark.md.
Operational docs: CONTRIBUTING § Golden queries · benchmark § Fixtures · Runner: scripts/query-golden.ts · Schema: scripts/query-golden/schema.ts
| Goal | How scenarios help |
|---|---|
| Catch regressions | Parser or schema drift → JSON diff vs golden |
| Encode good answers | Human-reviewed rows for representative queries |
| Stress realistic size | Optional second corpus beyond fixtures/minimal |
| Stay deterministic | Assertions on query output, not model prose |
- Chat / SSE / auth harnesses — out of scope here
- Proving agents follow rules — measure in the IDE or another project
- Replacing
src/benchmark.ts— that stays SQL vs glob/read time; goldens add correctness snapshots
| Piece | Role |
|---|---|
fixtures/minimal/ |
Tier A corpus; stable for CI |
src/benchmark.ts |
Speed comparison (not golden row equality) |
bun test |
Unit tests for parsers, CLI, DB |
CODEMAP_ROOT / --root |
Index any tree; Tier B uses env + optional gitignore |
We do not commit another product’s source tree, paths, business strings, or golden JSON derived from a private app (or any repo we do not own and license for redistribution).
| Safe to commit here | Not committed here |
|---|---|
fixtures/minimal/ (trees we control) |
Clones of private apps |
Generic SQL / --recipe ids |
App-specific path literals in assets |
| Goldens from our fixtures only | Snapshots keyed to proprietary names |
Abstract prompt text (intent labels) |
Verbatim customer prompts |
Tier B: Point CODEMAP_ROOT at a local clone; goldens for that tree stay gitignored (or private automation) — see .gitignore and benchmark § Tier B.
| Tier | Corpus | When | Purpose |
|---|---|---|---|
| A | fixtures/minimal (in-repo) |
Every PR / bun run check |
Fast, committed goldens |
| B | Local path via CODEMAP_* |
Maintainer machine | Scale; goldens optional / gitignored |
| B′ (future) | Public OSS fixture only | CI optional | Larger committed corpus if license OK |
Scenarios live in fixtures/golden/scenarios.json (Tier A) or optional scenarios.external.json / example (Tier B). Each entry has id, sql or recipe, optional match (exact, minRows, everyRowContains), optional budgetMs. Goldens: fixtures/golden/minimal/*.json etc. Refresh: bun scripts/query-golden.ts --update.
Prompts in JSON are intent labels, not pasted chat logs — pair with queries whose literals come from fixture-owned data (see fixtures/qa/prompts.external.template.md for optional chat QA).
| Area | State |
|---|---|
| Tier A runner + CI | bun run test:golden in check |
| Tier A scenario coverage | 15 scenarios across all indexed tables: files, symbols, imports, exports, components, dependencies, markers, type_members, calls, CSS vars/classes/keyframes |
| Tier B external + schema | test:golden:external, Zod in scripts/query-golden/schema.ts |
| Subset matchers + budgets | match, budgetMs, --strict-budget |
| Optional CI for public corpus | Deferred — roadmap § Backlog |
- benchmark.md — speed methodology, Tier B, fixtures
- architecture.md — schema, parsers
- roadmap.md — backlog