Skip to content

chore(deps): upgrade megatron-core, megatron-bridge, sglang, vllm#1206

Open
garrett4wade wants to merge 1 commit intomainfrom
chore/upgrade-deps-2026-04
Open

chore(deps): upgrade megatron-core, megatron-bridge, sglang, vllm#1206
garrett4wade wants to merge 1 commit intomainfrom
chore/upgrade-deps-2026-04

Conversation

@garrett4wade
Copy link
Copy Markdown
Collaborator

@garrett4wade garrett4wade commented Apr 18, 2026

Description

Upgrade focused runtime dependencies across both SGLang and vLLM variants (fixing #1189 ):

Package Old New Variant(s)
megatron-core 0.16.0 0.17.0 both
megatron-bridge 0.3.0 0.4.0 both
sglang 0.5.9 0.5.10.post1 sglang
vllm 0.17.0 0.19.1 vllm
transformers 4.57.1 5.3.0 / 5.5.4 sglang / vllm
peft 0.18.1 0.19.1 vllm (transitive)

Dependency Resolution

  • transformers divergence: megatron-bridge 0.4.0 requires transformers 5.0–5.3, vllm 0.19.1 excludes 5.0–5.5. SGLang variant resolves to 5.3.0. vLLM variant overrides megatron-bridge's constraint, resolves to 5.5.4.
  • Python 3.12 markers: megatron-core 0.17.0 and megatron-bridge 0.4.0 require Python ≥3.12. Added markers so Python 3.11 envs skip megatron packages.
  • flash-attn-4: New dependency from sglang 0.5.10.post1 (pure Python CUTE DSL). Added explicit prerelease pin in override-dependencies.
  • Dockerfile: Base image updated to sglang:v0.5.10.post1-cu129-amd64-runtime.

API Compatibility Audit

All 6 updated focused packages audited against upstream source — 0 breaking changes:

  • megatron-core (18 entries): all signatures preserved
  • megatron-bridge (7 entries): all APIs compatible; save_hf_adapter now native (monkey-patch dead code)
  • sglang (14 entries): all HTTP endpoints and SDK imports preserved
  • vllm (14 entries): all public + private APIs compatible
  • transformers (12 entries): verified against both 5.3.0 and 5.5.4
  • peft (4 entries): LoraConfig, TaskType, get_peft_model unchanged

Risk Areas

  1. Docker base image: lmsysorg/sglang:v0.5.10.post1-cu129-amd64-runtime tag needs CI validation
  2. vLLM variant transformers override: 5.5.4 exceeds megatron-bridge's <=5.3.0 ceiling — minor runtime risk
  3. megatron-bridge + transformers 5.x: megatron-bridge was verified against its 0.4.0 source but runtime testing is needed

Related Issue

N/A — routine dependency upgrade

Type of Change

  • 🐛 Bug fix
  • ✨ New feature
  • 💥 Breaking change
  • 📝 Documentation update
  • ♻️ Refactoring
  • ⚡ Performance improvement
  • ✅ Test coverage improvement

Checklist

  • I have read the Contributing Guide
  • Pre-commit hooks pass (pre-commit run --all-files)
  • Relevant tests pass; new tests added for new functionality
  • Documentation updated (if applicable; built with ./docs/build_all.sh)
  • Branch is up to date with main
  • Self-reviewed via /review-pr command
  • This PR was created by a coding agent via /create-pr
  • This PR is a breaking change

Breaking Change Details:

  • megatron-core and megatron-bridge now require Python ≥3.12 (upstream requirement). Python 3.11 environments will not install megatron packages.
  • transformers upgraded from 4.x to 5.x (SGLang: 5.3.0, vLLM: 5.5.4). Any custom code depending on removed 4.x APIs may need updates.

Additional Context

  • Full audit details in upgrade-summary.md (gitignored, generated during upgrade)
  • The upgrade-deps skill with per-package API checklists is included for future upgrades
  • _monkey_patch_save_hf_adapter() in megatron_lora.py is now dead code (megatron-bridge 0.4.0 ships native save_hf_adapter). Can be cleaned up in a follow-up PR.

@garrett4wade garrett4wade requested a review from fishcrap as a code owner April 18, 2026 12:09
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new upgrade-deps skill and a comprehensive set of API compatibility checklists for core dependencies like megatron-core, transformers, sglang, and vllm. It updates several package versions in pyproject.toml and pyproject.vllm.toml, including a bump to megatron-core 0.17.0 and the addition of Python 3.12 version markers. Feedback identifies a missing architecture suffix in the Dockerfile base image and recommends consistent application of the Python 3.12 marker to the mbridge package across both project manifests.

Comment thread Dockerfile
# docker build --build-arg VARIANT=vllm -t areal-runtime:dev-vllm . # vllm variant

FROM lmsysorg/sglang:v0.5.9-cu129-amd64-runtime
FROM lmsysorg/sglang:v0.5.10.post1-runtime
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The base image tag v0.5.10.post1-runtime is missing the -cu129-amd64 suffix mentioned in the pull request description. This discrepancy might lead to pulling an incorrect image variant (e.g., a different CUDA version or architecture), which could cause runtime issues or build failures for C++ extensions that depend on specific CUDA headers.

FROM lmsysorg/sglang:v0.5.10.post1-cu129-amd64-runtime

Comment thread pyproject.toml Outdated
megatron = [
"megatron-core==0.16.0; sys_platform == 'linux' and platform_machine == 'x86_64'",
"megatron-core==0.17.0; python_version >= '3.12' and sys_platform == 'linux' and platform_machine == 'x86_64'",
"mbridge==0.15.1; sys_platform == 'linux' and platform_machine == 'x86_64'",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The mbridge package is missing the python_version >= '3.12' marker, which was added to megatron-core and megatron-bridge in this pull request. Since these packages are part of the same functional group and share the same upstream platform requirements, the marker should be applied consistently to avoid resolution issues or partial installations in Python 3.11 environments.

Suggested change
"mbridge==0.15.1; sys_platform == 'linux' and platform_machine == 'x86_64'",
"mbridge==0.15.1; python_version >= '3.12' and sys_platform == 'linux' and platform_machine == 'x86_64'",

Comment thread pyproject.vllm.toml Outdated
megatron = [
"megatron-core==0.16.0; sys_platform == 'linux' and platform_machine == 'x86_64'",
"megatron-core==0.17.0; python_version >= '3.12' and sys_platform == 'linux' and platform_machine == 'x86_64'",
"mbridge==0.15.1; sys_platform == 'linux' and platform_machine == 'x86_64'",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The mbridge package is missing the python_version >= '3.12' marker, which is inconsistent with the other megatron-related packages in this extra. This should be updated to match the constraints applied in the default pyproject.toml to ensure consistent dependency resolution across variants.

Suggested change
"mbridge==0.15.1; sys_platform == 'linux' and platform_machine == 'x86_64'",
"mbridge==0.15.1; python_version >= '3.12' and sys_platform == 'linux' and platform_machine == 'x86_64'",

Port checkpoint handling from custom hf_load/hf_save to
megatron-bridge, update Dockerfile for uv sync with
dependency-metadata support, and fix VLM rollout issues.

Key changes:
- Upgrade megatron-core, megatron-bridge, sglang, vllm in pyproject.toml
- Replace areal/models/mcore/hf_load.py and hf_save.py with megatron-bridge
- Switch Dockerfile to uv sync for dependency-metadata override support
- Add nvidia-modelopt[hf] and cppimport dependency-metadata override
- Fix VLM RL with mm_token_type_ids handling
- Fix reward timeout configuration
- Consolidate upgrade skills into single upgrade-deps skill
- Update docs for bridge backend, tree training, and installation
@garrett4wade garrett4wade force-pushed the chore/upgrade-deps-2026-04 branch from 3899666 to 60a27d1 Compare April 21, 2026 08:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant