add support for bigtable materialized view source#38056
add support for bigtable materialized view source#38056psud wants to merge 1 commit intoapache:masterfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces support for reading from Bigtable Continuous Materialized Views within the Bigtable IO Java connector. It provides a new configuration option that is mutually exclusive with the existing table ID parameter, updates the underlying service implementation to correctly construct resource paths for materialized views, and includes necessary validation and testing to ensure robust integration. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
|
R: @kennknowles or R: @tonytanger as you two are the most frequent commiters on the BigTableIO.java file Would you be able to take a look at this PR, please? It uses all the same mechanisms that the normal BigTable source uses to read from tables but adds options and usage of the materialized views endpoints. Please also let me know if this doesn't qualify as a "small" PR and I should send out an email to dev@beam.apache.org to discuss this |
|
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment |
|
R: @tonytanger |
| @@ -139,6 +140,7 @@ static class BigtableReaderImpl implements Reader { | |||
| private final String projectId; | |||
| private final String instanceId; | |||
| private final String tableId; | |||
|
would it be worth adding integration test?- see BigtableReadIT for table read tests. |
|
Good call out with the integration test @stankiewicz ! Looking at the existing Pattern 2: Create the table and materialized view as part of the test which is similar to how testE2EBigtableSegmentRead works Creating an MV as part of test setup isn't ideal as materialized views can take a few minutes to initialize and populate, so the test would need to poll and wait, making it significantly slower than the existing table-based tests. @mutianf I saw that you last edited a lot of the e2e tests and may have access to the Alternatively if @stankiewicz knows anyone else who could know more about this I'd love some input |
|
I’ve learned from the Bigtable product team that Materialized Views currently return structured row keys without a public decoding mapping. This makes it impossible to process row keys in Beam as originally intended. Reading all records is possible but reading row ranges is not reliably possible. I’m putting this PR on hold until that prerequisite is addressed service-side, unless anyone with knowledge about this sees any other option to proceed without the structured row keys in place. A workaround is documented in the associated issue with this PR |
|
thanks Patrick, do you have timeline on prerequisite work? |
pattern 2 is probably better as contributors may want to run tests outside of |
so the prerequisite is providing client library that will allow serializing |
|
In order to read the structured row keys, an option right now is to read the Materialized View is to use Bigtable SQL https://docs.cloud.google.com/bigtable/docs/introduction-sql. You can see structured row key documentation and examples here https://docs.cloud.google.com/bigtable/docs/structured-row-key-queries The Bigtable Java client library has native support for querying SQL. However, I think there's quite a bit of work to add a new reader to handle SQL correctly. |
Adds support in the BigTable IO Java connector for BigTables new materialized views. Fixes #38053
Testing
addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.