Refactor sharding dump to support custom mesh rule and diverse sharding alternatives#3639
Merged
copybara-service[bot] merged 5 commits intomainfrom Apr 13, 2026
Merged
Refactor sharding dump to support custom mesh rule and diverse sharding alternatives#3639copybara-service[bot] merged 5 commits intomainfrom
copybara-service[bot] merged 5 commits intomainfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
51c177e to
c69196e
Compare
khatwanimohit
approved these changes
Apr 13, 2026
c69196e to
640eecb
Compare
640eecb to
eed004f
Compare
gobbleturk
approved these changes
Apr 13, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
TL;DR: This PR refactors the sharding dump test cases to reduce redundancy, eliminates the rigid Cartesian product structure, and adds support for testing
custom_mesh_and_rulealongside explicit sharding-related flags.Background
The current sharding dump test in MaxText is highly valuable: it dumps the logical and physical sharding specs of specific test cases from AOT compilation. For subsequent code changes, AOT is triggered again and the details are compared to ensure sharding behavior hasn't unintentionally broken.
However, the existing test cases lack diversity and are unnecessarily redundant. We currently test a rigid Cartesian product of:
MODEL_NAMES= ["deepseek2-16b", "qwen3-0.6b", "gpt-oss-20b"]TOPOLOGIES= ["tpu7x-16", "v6e-16", "v5p-16"]SLICES= [1, 4]Because these tests all run using the default sharding settings (full FSDP and DP_DCN), the Cartesian approach inflates test runtimes without actually expanding our coverage of different sharding strategies.
What this PR does
This PR drops the previous Cartesian product structure in favor of explicitly defined test combinations. This gives us the flexibility to test a much wider variety of sharding configurations. We now support the following degrees of freedom:
MODEL_NAMES= ["deepseek2-16b", "qwen3-0.6b", "gpt-oss-20b"]TOPOLOGIES= ["tpu7x-16", "v6e-16", "v5p-16"]SLICES= [1, 4]custom_mesh_and_rule= [default, "pure-fsdp", "pipeline-large-moe"]ici_fsdp_parallelism=2,ici_expert_parallelism=2,use_ring_of_experts=true)Example of the new test cases structure:
By moving to this explicit list, it becomes much easier to add targeted test cases in the future. For example, when a new custom mesh and rule is introduced, a specific test case can now be easily appended to
TEST_CASESto ensure future regression protection.Since customized meshes and rules are protected by sharding dump test after this change, this PR deprecates the previous
custom_mesh_and_axesunit test.Checklist
Before submitting this PR, please make sure (put X in square brackets):
gemini-reviewlabel.