Skip to content

tp: replace the handrolled preprocessor with syntaqlite#5472

Draft
LalitMaganti wants to merge 18 commits intomainfrom
dev/lalitm/synq-kill-preprocessor
Draft

tp: replace the handrolled preprocessor with syntaqlite#5472
LalitMaganti wants to merge 18 commits intomainfrom
dev/lalitm/synq-kill-preprocessor

Conversation

@LalitMaganti
Copy link
Copy Markdown
Member

@LalitMaganti LalitMaganti commented Apr 15, 2026

Now that the parser is backed by syntaqlite, macro expansion and
statement splitting can live alongside the rest of parsing instead of
behind a separate preprocessor pass. This CL:

  • Deletes src/trace_processor/perfetto_sql/preprocessor/ and its
    generated grammar entirely.
  • Moves macro bookkeeping (the Macro struct + registry) onto
    PerfettoSqlParser and introduces intrinsic_macro_expansion.{cc,h}
    for the built-in macros previously implemented by the preprocessor.
  • Teaches the parser unittest to drive the new parser-resident
    expansion directly.
  • Updates SqlSource with a FromMacroExpansion constructor so
    SQLite errors traceback back to the macro call site.
  • Drops the preprocessor dependency from the engine and rewires the
    parser unittests target against syntaqlite.

Also bundles the bump of vendored syntaqlite (hash +
--macro-style rust) since the new macro-lookup/rewrite API surface
only has a consumer once the preprocessor is gone.

@LalitMaganti LalitMaganti requested a review from a team as a code owner April 15, 2026 14:21
@LalitMaganti LalitMaganti marked this pull request as draft April 15, 2026 14:23
@LalitMaganti LalitMaganti force-pushed the dev/lalitm/synq-vendor branch from 3dfc93b to 8c6b320 Compare April 16, 2026 15:31
@LalitMaganti LalitMaganti force-pushed the dev/lalitm/synq-kill-preprocessor branch from 44bcdca to 5be3e1e Compare April 16, 2026 15:31
@LalitMaganti LalitMaganti changed the title perfetto: fix YAML syntax in release workflows tp: kill the handrolled PerfettoSQL preprocessor Apr 16, 2026
@LalitMaganti LalitMaganti changed the title tp: kill the handrolled PerfettoSQL preprocessor tp: replace the handrolled PerfettoSQL preprocessor with syntaqlite Apr 16, 2026
@LalitMaganti LalitMaganti changed the title tp: replace the handrolled PerfettoSQL preprocessor with syntaqlite tp: replace the handrolled preprocessor with syntaqlite Apr 16, 2026
@LalitMaganti LalitMaganti force-pushed the dev/lalitm/synq-vendor branch from 8c6b320 to d85eb80 Compare April 16, 2026 18:50
Bump the pinned syntaqlite SHA to pull in several fixes that obsolete
local workarounds:

- #109: SyntaqliteTextSpan.length widened to uint32_t — drop
  (uint16_t) casts on select_span initializers.
- #111: extract_terminals_from_y excludes base grammar terminals —
  removes the need for the local LP/RP filter.
- #112: generated amalgamation self-suppresses compiler warnings —
  drop the BUILD.gn cflags_c list and the #pragma wrappers around
  the header include in the tokenizer.
- #113: WINDOW/OVER/FILTER treated as context-sensitive keywords —
  drop the local FILTER fallback in perfetto_module_name.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 16, 2026

Pull platform-specific prebuilt syntaqlite CLI archives from the v0.4.1
GitHub release rather than cloning the source tree and invoking
`cargo build`. This avoids needing a Rust toolchain just to regenerate
the PerfettoSQL parser.

The new `--syntaqlite` flag on tools/install-build-deps gates fetching
the archive (only contributors editing the grammar need it); normal
builds continue to consume the committed generated output.
`tools/gen_syntaqlite_parser` now directs users to run
`tools/install-build-deps --syntaqlite` when the binary is missing, and
the regenerated `syntaqlite_perfetto.{c,h}` reflect v0.4.1.
Rebuilds syntaqlite_perfetto.{c,h} from a newer syntaqlite snapshot
(adds straddle tracking and cross-layer poisoning, among other fixes)
and teaches the gen_syntaqlite_parser wrapper to pass
--macro-style rust.
Now that the parser is backed by syntaqlite, macro expansion and
statement splitting can live alongside the rest of parsing instead of
behind a separate preprocessor pass.  This CL:

 * Deletes src/trace_processor/perfetto_sql/preprocessor/ and its
   generated grammar entirely.
 * Moves macro bookkeeping (the Macro struct + registry) onto
   PerfettoSqlParser and introduces intrinsic_macro_expansion.{cc,h}
   for the built-in macros previously implemented by the preprocessor.
 * Teaches the parser unittest to drive the new parser-resident
   expansion directly.
 * Updates SqlSource with a FromMacroExpansion constructor so SQLite
   errors traceback back to the macro call site.
 * Drops the preprocessor dependency from the engine and rewires the
 parser unittests target against syntaqlite.

The base::TrimWhitespace(std::string_view) overload used by
intrinsic_macro_expansion.cc will land in a separate CL; this branch
will be unbuildable until it does.
@LalitMaganti LalitMaganti force-pushed the dev/lalitm/synq-kill-preprocessor branch from 47e96e3 to c412a68 Compare April 16, 2026 22:55
@LalitMaganti
Copy link
Copy Markdown
Member Author

LalitMaganti commented Apr 16, 2026

The ExpandsMacroInsideArg crash is partly an upstream syntaqlite issue: the header comment says body_call_length == 0 signals an arg-internal call, but the implementation actually uses SYNTAQLITE_MACRO_BODY_CALL_ARG_INTERNAL (= UINT32_MAX).

So BuildForUserMacro:

if (c.body_call_length == 0) continue;

never trips for the arg-internal case.

LalitMaganti added a commit to LalitMaganti/syntaqlite that referenced this pull request Apr 16, 2026
The public header, internal header, and session.rs test comment all
claimed the arg-internal case was signalled by `body_call_length == 0`,
but the implementation sets both `body_call_offset` and `body_call_length`
to `SYNTAQLITE_MACRO_BODY_CALL_ARG_INTERNAL` (UINT32_MAX) — and that is
what the public sentinel macro documents at parser.h:236.

Downstream consumers (e.g. google/perfetto#5472) that trusted the stale
comment ended up pushing UINT32_MAX offsets into their rewriter and
crashing on overflow. Pure doc fix — no behavior change.
LalitMaganti added a commit to LalitMaganti/syntaqlite that referenced this pull request Apr 16, 2026
## Motivation

The public header, internal header, and a session.rs test comment all
said the arg-internal case for a nested macro call was signalled by
`body_call_length == 0`. That contradicts the implementation:
`parser_macros.c:530-535` sets *both* `body_call_offset` and
`body_call_length` to `SYNTAQLITE_MACRO_BODY_CALL_ARG_INTERNAL` (=
`UINT32_MAX`), matching the sentinel defined at `parser.h:236` and the
Rust docs at `types.rs:274-277`.

Downstream consumers that trusted the stale comment can crash. Concrete
example: google/perfetto#5472 filtered with `if (c.body_call_length ==
0) continue;`, which never trips for arg-internal calls, so a
`RewriteItem{start = UINT32_MAX, end = UINT32_MAX + UINT32_MAX}` ended
up in `SqlSource::Rewriter` and tripped its `start <= end` check.

## Changes

Pure documentation fix — no behavior change:

- `syntaqlite-syntax/include/syntaqlite/parser.h` — refer to
`SYNTAQLITE_MACRO_BODY_CALL_ARG_INTERNAL` (UINT32_MAX) as the
arg-internal sentinel on both fields.
- `syntaqlite-syntax/csrc/parser_internal.h` — same fix on the
`SynqExpansionLayer` fields.
- `syntaqlite-syntax/src/parser/session.rs` — update the test comment
that still said `body_call_length == 0`.
BuildForUserMacro relied on syntaqlite's body_call_offset /
body_call_length to locate nested calls inside a user macro's authored
body.  Two problems:

 * The header comment claims body_call_length == 0 signals an
   arg-internal call, but the implementation uses
   SYNTAQLITE_MACRO_BODY_CALL_ARG_INTERNAL (= UINT32_MAX).  The old
   `== 0` check never fired, so arg-internal children were pushed as
   rewrites with UINT32_MAX ranges, crashing ExpandsMacroInsideArg.

 * The per-segment shift upstream is computed as
   `max(0, sub_length - body_length)`, which silently drops the signed
   delta whenever a substitution is *shorter* than its authored
   placeholder (e.g. `$y` (2) → `5` (1)).  body_call_offset /
   body_call_length are off-by-one whenever that happens, producing
   garbled expansions like `SELECT 5 +(10 + 1))`.

Work around both in the consumer: compare against the real sentinel
and recompute the body range ourselves by walking the parent's arg
segments with signed deltas.  A TODO points at the upstream fix so
we can drop the workaround once the vendored syntaqlite is re-rolled.
LalitMaganti added a commit that referenced this pull request Apr 20, 2026
Replaces the handrolled Lemon grammar under `perfetto_sql/grammar/`
and the vendored-SQLite-based tokenizer in
`perfetto_sql/tokenizer/tokenize_internal.c` with the standalone
syntaqlite parser (pulled in via `buildtools/syntaqlite`).

`PerfettoSqlParser` now walks the syntaqlite AST for PerfettoSQL
statements (`CREATE PERFETTO {TABLE,VIEW,FUNCTION,MACRO,INDEX}`,
`INCLUDE PERFETTO MODULE`, etc.) instead of its own recursive-descent
code, and defers to SQLite for everything else.

The legacy preprocessor is retained as-is on this PR; killing it is
stacked on top in #5472.

Codegen tooling moves from `tools/update_sql_parsers.py` to
`tools/gen_syntaqlite_parser`.
Base automatically changed from dev/lalitm/synq-vendor to main April 20, 2026 10:38
v0.5.1 restructures the `dialect` subcommand into `dialect generate`,
so update tools/gen_syntaqlite_parser accordingly and regenerate the
amalgamated PerfettoSQL parser (syntaqlite_perfetto.{c,h}).
syntaqlite v0.5.1 changed every layer-0 offset it emits (node extents,
spans, macro rewrite call offsets, error offsets) from document-relative
to statement-relative, measured from the start of the slice returned by
the new syntaqlite_parser_text API.

Thread the statement's document offset through MacroRewriteBuilder,
ParseCreateMacro, and the error path, adding it when slicing
impl_->source so each stmt.Substr call lands at the right document
position. Avoids materializing an extra SqlSource slice per Next().
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant