chore(deps): upgrade megatron-core, megatron-bridge, sglang, vllm#1206
chore(deps): upgrade megatron-core, megatron-bridge, sglang, vllm#1206garrett4wade wants to merge 1 commit intomainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a new upgrade-deps skill and a comprehensive set of API compatibility checklists for core dependencies like megatron-core, transformers, sglang, and vllm. It updates several package versions in pyproject.toml and pyproject.vllm.toml, including a bump to megatron-core 0.17.0 and the addition of Python 3.12 version markers. Feedback identifies a missing architecture suffix in the Dockerfile base image and recommends consistent application of the Python 3.12 marker to the mbridge package across both project manifests.
| # docker build --build-arg VARIANT=vllm -t areal-runtime:dev-vllm . # vllm variant | ||
|
|
||
| FROM lmsysorg/sglang:v0.5.9-cu129-amd64-runtime | ||
| FROM lmsysorg/sglang:v0.5.10.post1-runtime |
There was a problem hiding this comment.
The base image tag v0.5.10.post1-runtime is missing the -cu129-amd64 suffix mentioned in the pull request description. This discrepancy might lead to pulling an incorrect image variant (e.g., a different CUDA version or architecture), which could cause runtime issues or build failures for C++ extensions that depend on specific CUDA headers.
FROM lmsysorg/sglang:v0.5.10.post1-cu129-amd64-runtime
| megatron = [ | ||
| "megatron-core==0.16.0; sys_platform == 'linux' and platform_machine == 'x86_64'", | ||
| "megatron-core==0.17.0; python_version >= '3.12' and sys_platform == 'linux' and platform_machine == 'x86_64'", | ||
| "mbridge==0.15.1; sys_platform == 'linux' and platform_machine == 'x86_64'", |
There was a problem hiding this comment.
The mbridge package is missing the python_version >= '3.12' marker, which was added to megatron-core and megatron-bridge in this pull request. Since these packages are part of the same functional group and share the same upstream platform requirements, the marker should be applied consistently to avoid resolution issues or partial installations in Python 3.11 environments.
| "mbridge==0.15.1; sys_platform == 'linux' and platform_machine == 'x86_64'", | |
| "mbridge==0.15.1; python_version >= '3.12' and sys_platform == 'linux' and platform_machine == 'x86_64'", |
| megatron = [ | ||
| "megatron-core==0.16.0; sys_platform == 'linux' and platform_machine == 'x86_64'", | ||
| "megatron-core==0.17.0; python_version >= '3.12' and sys_platform == 'linux' and platform_machine == 'x86_64'", | ||
| "mbridge==0.15.1; sys_platform == 'linux' and platform_machine == 'x86_64'", |
There was a problem hiding this comment.
The mbridge package is missing the python_version >= '3.12' marker, which is inconsistent with the other megatron-related packages in this extra. This should be updated to match the constraints applied in the default pyproject.toml to ensure consistent dependency resolution across variants.
| "mbridge==0.15.1; sys_platform == 'linux' and platform_machine == 'x86_64'", | |
| "mbridge==0.15.1; python_version >= '3.12' and sys_platform == 'linux' and platform_machine == 'x86_64'", |
4470eaa to
9b09beb
Compare
3b869e8 to
b180e0c
Compare
Port checkpoint handling from custom hf_load/hf_save to megatron-bridge, update Dockerfile for uv sync with dependency-metadata support, and fix VLM rollout issues. Key changes: - Upgrade megatron-core, megatron-bridge, sglang, vllm in pyproject.toml - Replace areal/models/mcore/hf_load.py and hf_save.py with megatron-bridge - Switch Dockerfile to uv sync for dependency-metadata override support - Add nvidia-modelopt[hf] and cppimport dependency-metadata override - Fix VLM RL with mm_token_type_ids handling - Fix reward timeout configuration - Consolidate upgrade skills into single upgrade-deps skill - Update docs for bridge backend, tree training, and installation
3899666 to
60a27d1
Compare
Description
Upgrade focused runtime dependencies across both SGLang and vLLM variants (fixing #1189 ):
Dependency Resolution
transformers 5.0–5.3, vllm 0.19.1 excludes5.0–5.5. SGLang variant resolves to 5.3.0. vLLM variant overrides megatron-bridge's constraint, resolves to 5.5.4.sglang:v0.5.10.post1-cu129-amd64-runtime.API Compatibility Audit
All 6 updated focused packages audited against upstream source — 0 breaking changes:
save_hf_adapternow native (monkey-patch dead code)Risk Areas
lmsysorg/sglang:v0.5.10.post1-cu129-amd64-runtimetag needs CI validation<=5.3.0ceiling — minor runtime riskRelated Issue
N/A — routine dependency upgrade
Type of Change
Checklist
pre-commit run --all-files)./docs/build_all.sh)main/review-prcommand/create-prBreaking Change Details:
megatron-coreandmegatron-bridgenow require Python ≥3.12 (upstream requirement). Python 3.11 environments will not install megatron packages.transformersupgraded from 4.x to 5.x (SGLang: 5.3.0, vLLM: 5.5.4). Any custom code depending on removed 4.x APIs may need updates.Additional Context
upgrade-summary.md(gitignored, generated during upgrade)upgrade-depsskill with per-package API checklists is included for future upgrades_monkey_patch_save_hf_adapter()inmegatron_lora.pyis now dead code (megatron-bridge 0.4.0 ships nativesave_hf_adapter). Can be cleaned up in a follow-up PR.