Open
Conversation
This PR adds a new optimizer rule `ExtractLeafExpressions` that extracts `MoveTowardsLeafNodes` sub-expressions (like `get_field`) from Filter, Sort, Limit, Aggregate, and Projection nodes into intermediate projections. This normalization allows `OptimizeProjections` (which runs next) to merge consecutive projections and push `get_field` expressions down to the scan, enabling Parquet column pruning for struct fields. Example transformation for projections: ```sql SELECT id, s['label'] FROM t WHERE s['value'] > 150 ``` Before: `get_field(s, 'label')` stayed in ProjectionExec, reading full struct After: Both `get_field` expressions pushed to DataSourceExec The rule: - Extracts `MoveTowardsLeafNodes` expressions into `__leaf_N` aliases - Creates inner projections with extracted expressions + pass-through columns - Creates outer projections to restore original schema names - Handles deduplication of identical expressions - Skips expressions already aliased with `__leaf_*` to ensure idempotency Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implement `extract_from_join` to extract `MoveTowardsLeafNodes` sub-expressions (like get_field) from Join nodes: - Extract from `on` expressions (equijoin keys) - Extract from `filter` expressions (non-equi conditions) - Route extractions to appropriate side (left/right) based on columns - Add recovery projection to restore original schema Also adds unit tests and sqllogictest integration tests for: - Join with get_field in equijoin condition - Join with get_field in filter (WHERE clause) - Join with extractions from both sides - Left join with get_field extraction - Baseline join without extraction Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When `find_extraction_target` returns a Projection that renames columns
(e.g. `user AS x`), both `build_extraction_projection` and
`merge_into_extracted_projection` were adding extracted expressions that
reference the target's output columns (e.g. `col("x")`) to a projection
evaluated against the target's input (which only has `user`).
Fix by resolving extracted expressions and columns_needed through the
projection's rename mapping using `replace_cols_by_name` before merging.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Push extraction projections recursively through intermediate (recovery) projections to reach filters/sorts/limits in one pass - Guard merge against dropping uncaptured expressions (e.g. CSE's __common_expr aliases), fixing schema errors in optimize_projections - Eliminate redundant Column aliases by comparing unqualified name instead of schema_name() which includes the qualifier - Update projection_pushdown.slt: query that previously hit a schema error now optimizes and executes correctly Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…eryAlias, etc.) Replace the catch-all barrier in try_push_input() with a generic try_push_into_inputs() that routes extraction expressions to the correct input by column ownership. This enables get_field pushdown through Joins so SELECT s['value'] FROM t1 JOIN t2 reaches DataSourceExec. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Unify the Filter/Sort/Limit and SubqueryAlias match arms into the generic try_push_into_inputs path, reducing push_extraction_pairs from 4 arms to 2 (Projection merge + catch-all). Key changes: - Add SubqueryAlias qualifier remap in try_push_into_inputs so extraction pairs are rewritten from alias-space to input-space before routing - Add broadcast routing for Union nodes (clone pairs to all inputs) vs exclusive routing for Join/single-input nodes - Remove find_extraction_target and rebuild_path (no longer needed) - Add is_pure_extraction_projection guard on the Projection merge arm Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add test coverage for get_field extraction through SubqueryAlias (Section 14) and UNION ALL (Section 15) in projection_pushdown.slt. Fix broadcast routing for Union nodes: remap column qualifiers from Union-output-space to each input's qualifier space so extraction projections reference the correct qualified column names. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…Option return Make build_extraction_projection return Result<Option<LogicalPlan>> instead of requiring callers to check has_extractions() first. Remove the now-unused has_extractions() method. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove `manual_pairs` Vec and `manual_columns` IndexSet by inserting pre-existing `__extracted` aliases directly into the extractor's IndexMap. The full `Expr::Alias(…)` is used as the key so the alias name participates in equality — this prevents collisions when CSE rewrites produce the same inner expression under different alias names. When building the final extraction_pairs, the Alias wrapper is stripped so consumers see the usual `(inner_expr, alias_name)` tuples. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.