Skip to content

feat: support parsing stored procedures for bigquery dialect#761

Draft
AlexJBSilva wants to merge 2 commits intoreata:masterfrom
AlexJBSilva:parse_bigquery_procs
Draft

feat: support parsing stored procedures for bigquery dialect#761
AlexJBSilva wants to merge 2 commits intoreata:masterfrom
AlexJBSilva:parse_bigquery_procs

Conversation

@AlexJBSilva
Copy link
Copy Markdown

Address (in part, and specifically for bigquery dialect) what was asked in Discussion #556 and Issue #643.

The task is accomplished by adding the case for multi_statement_segment in
method _list_specific_statement_segment from class SqlFluffLineageAnalyzer
combined with the method recursive_crawl("statement") to extract all statements
from multi statement segments and a filter for processing only the statement types
supported by sqllineage (no warnings or exceptions as procedures can contain
many statements meaningless for lineage construction).

To cope with many statements inside a multi statement segment, the parse and
cache strategy from analyzer.split_tsql() method was leveraged (probably this
method should be renamed as it seems this approach can be useful for parsing
procedures from other dialects).

This pull request is in draft mode as I just made parsing stored procedures for bigquery dialect to work.
Any thoughts are welcome for getting this PR ready for merge.

Address (in part, and specifically for bigquery dialect) what was asked in Discussion reata#556 and Issue reata#643.
@AlexJBSilva AlexJBSilva force-pushed the parse_bigquery_procs branch from 2d509e3 to 5836345 Compare April 1, 2026 01:31
@AlexJBSilva
Copy link
Copy Markdown
Author

AlexJBSilva commented Apr 2, 2026

This is still a Work in Progress as the test is passing with sqlfluff 4.0.4 but failing with sqlfluff 4.1.0.
My implementation was relying on CREATE PROCEDURE statement being wrapped by sqlfluff bigquery dialect exclusive multi_statement_segment type. But the wrapping changed to the common statement type in sqlfluff/sqlfluff#7534.

@reata
Copy link
Copy Markdown
Owner

reata commented Apr 10, 2026

Hi @AlexJBSilva , thanks for contributing. Sorry it took me a while.

Here's some early feedback:

  1. For sqlfluff behavior change, this is not new. We're relying the internal details of sqlfluff, so it's a tax we pay. To get this moving, we can upgrade sqlfluff to >=4.1.0, and then implement this feature using the new parsed result. Actually I think statement makes sense than multi_statement_segment.
  2. I don't like we share the logic with TSQL_NO_SEMICOLON. As documented in https://sqllineage.readthedocs.io/en/latest/gear_up/configuration.html#tsql-no-semicolon, this config is to be removed in the long term, the split_tsql will be gone too.

Putting implementation effort aside, the way I'm thinking about stored procedure is, should we split one create procedure statement into multiple SQL statement, or we tackle it within a single statement? Personally I'd prefer the latter approach, but let me know your thought.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants