Shape mismatch occurs when dim0 is not divisible by global_rank_size

Model: 
qwen3-235b-a22b

Config:
tp_size=4, dp_shard_size=64

> cosmos_rl/policy/model/qwen3_vl_moe/__init__.py", line 893, in load_hf_weights
> [rank130]:     assert local_view.shape == sharded_weight.shape, (
> [rank130]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> [rank130]: AssertionError: Shape mismatch: torch.Size([594, 4096]) != torch.Size([593, 4096]) for lm_head.weight with original shape torch.Size([151936, 4096])


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shape mismatch occurs when dim0 is not divisible by global_rank_size #672

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Shape mismatch occurs when dim0 is not divisible by global_rank_size #672

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions