Model:
qwen3-235b-a22b
Config:
tp_size=4, dp_shard_size=64
cosmos_rl/policy/model/qwen3_vl_moe/init.py", line 893, in load_hf_weights
[rank130]: assert local_view.shape == sharded_weight.shape, (
[rank130]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank130]: AssertionError: Shape mismatch: torch.Size([594, 4096]) != torch.Size([593, 4096]) for lm_head.weight with original shape torch.Size([151936, 4096])
Model:
qwen3-235b-a22b
Config:
tp_size=4, dp_shard_size=64