Skip to content

Get field pushdown try 3 internal#53

Open
adriangb wants to merge 40 commits intomainfrom
get-field-pushdown-try-3-internal
Open

Get field pushdown try 3 internal#53
adriangb wants to merge 40 commits intomainfrom
get-field-pushdown-try-3-internal

Conversation

@adriangb
Copy link
Copy Markdown
Member

@adriangb adriangb commented Feb 6, 2026

No description provided.

adriangb and others added 28 commits February 6, 2026 06:48
This PR adds a new optimizer rule `ExtractLeafExpressions` that extracts
`MoveTowardsLeafNodes` sub-expressions (like `get_field`) from Filter,
Sort, Limit, Aggregate, and Projection nodes into intermediate projections.

This normalization allows `OptimizeProjections` (which runs next) to merge
consecutive projections and push `get_field` expressions down to the scan,
enabling Parquet column pruning for struct fields.

Example transformation for projections:
```sql
SELECT id, s['label'] FROM t WHERE s['value'] > 150
```

Before: `get_field(s, 'label')` stayed in ProjectionExec, reading full struct
After: Both `get_field` expressions pushed to DataSourceExec

The rule:
- Extracts `MoveTowardsLeafNodes` expressions into `__leaf_N` aliases
- Creates inner projections with extracted expressions + pass-through columns
- Creates outer projections to restore original schema names
- Handles deduplication of identical expressions
- Skips expressions already aliased with `__leaf_*` to ensure idempotency

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implement `extract_from_join` to extract `MoveTowardsLeafNodes`
sub-expressions (like get_field) from Join nodes:

- Extract from `on` expressions (equijoin keys)
- Extract from `filter` expressions (non-equi conditions)
- Route extractions to appropriate side (left/right) based on columns
- Add recovery projection to restore original schema

Also adds unit tests and sqllogictest integration tests for:
- Join with get_field in equijoin condition
- Join with get_field in filter (WHERE clause)
- Join with extractions from both sides
- Left join with get_field extraction
- Baseline join without extraction

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When `find_extraction_target` returns a Projection that renames columns
(e.g. `user AS x`), both `build_extraction_projection` and
`merge_into_extracted_projection` were adding extracted expressions that
reference the target's output columns (e.g. `col("x")`) to a projection
evaluated against the target's input (which only has `user`).

Fix by resolving extracted expressions and columns_needed through the
projection's rename mapping using `replace_cols_by_name` before merging.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
adriangb and others added 12 commits February 6, 2026 07:20
- Push extraction projections recursively through intermediate
  (recovery) projections to reach filters/sorts/limits in one pass
- Guard merge against dropping uncaptured expressions (e.g. CSE's
  __common_expr aliases), fixing schema errors in optimize_projections
- Eliminate redundant Column aliases by comparing unqualified name
  instead of schema_name() which includes the qualifier
- Update projection_pushdown.slt: query that previously hit a schema
  error now optimizes and executes correctly

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…eryAlias, etc.)

Replace the catch-all barrier in try_push_input() with a generic
try_push_into_inputs() that routes extraction expressions to the
correct input by column ownership. This enables get_field pushdown
through Joins so SELECT s['value'] FROM t1 JOIN t2 reaches DataSourceExec.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Unify the Filter/Sort/Limit and SubqueryAlias match arms into the
generic try_push_into_inputs path, reducing push_extraction_pairs
from 4 arms to 2 (Projection merge + catch-all).

Key changes:
- Add SubqueryAlias qualifier remap in try_push_into_inputs so
  extraction pairs are rewritten from alias-space to input-space
  before routing
- Add broadcast routing for Union nodes (clone pairs to all inputs)
  vs exclusive routing for Join/single-input nodes
- Remove find_extraction_target and rebuild_path (no longer needed)
- Add is_pure_extraction_projection guard on the Projection merge arm

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add test coverage for get_field extraction through SubqueryAlias
(Section 14) and UNION ALL (Section 15) in projection_pushdown.slt.

Fix broadcast routing for Union nodes: remap column qualifiers from
Union-output-space to each input's qualifier space so extraction
projections reference the correct qualified column names.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…Option return

Make build_extraction_projection return Result<Option<LogicalPlan>> instead
of requiring callers to check has_extractions() first. Remove the now-unused
has_extractions() method.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove `manual_pairs` Vec and `manual_columns` IndexSet by inserting
pre-existing `__extracted` aliases directly into the extractor's
IndexMap. The full `Expr::Alias(…)` is used as the key so the alias
name participates in equality — this prevents collisions when CSE
rewrites produce the same inner expression under different alias names.
When building the final extraction_pairs, the Alias wrapper is stripped
so consumers see the usual `(inner_expr, alias_name)` tuples.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant