Skip to content

[WIP]feat: support Speculative Decoding by Sglang Eagle algo#1176

Draft
TaoZex wants to merge 24 commits intoinclusionAI:mainfrom
TaoZex:spec_v1
Draft

[WIP]feat: support Speculative Decoding by Sglang Eagle algo#1176
TaoZex wants to merge 24 commits intoinclusionAI:mainfrom
TaoZex:spec_v1

Conversation

@TaoZex
Copy link
Copy Markdown
Contributor

@TaoZex TaoZex commented Apr 13, 2026

Description

Related Issue

Fixes #(issue)

Type of Change

  • 🐛 Bug fix
  • ✨ New feature
  • 💥 Breaking change
  • 📝 Documentation update
  • ♻️ Refactoring
  • ⚡ Performance improvement
  • ✅ Test coverage improvement

Checklist

  • I have read the Contributing Guide
  • Pre-commit hooks pass (pre-commit run --all-files)
  • Relevant tests pass; new tests added for new functionality
  • Documentation updated (if applicable; built with ./docs/build_all.sh)
  • Branch is up to date with main
  • Self-reviewed via /review-pr command
  • This PR was created by a coding agent via /create-pr
  • This PR is a breaking change

Breaking Change Details (if applicable):

Additional Context


Need help? Check the Contributing Guide or ask in
GitHub Discussions!

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for speculative decoding using EAGLE and Multi-Token Prediction (MTP) online training. Key changes include the addition of MTP and speculative decoding configuration fields across various API and engine components, implementation of MTP loss collection and gradient isolation in the Megatron engine, and the inclusion of speculative decoding statistics in model responses. Additionally, the PR provides comprehensive documentation, example configurations, and end-to-end tests. Feedback focuses on improving configuration consistency in SGLangConfig, removing redundant no-op string replacements in the MTP layer conversion utility, and simplifying logic in the RLVR workflow by removing a redundant conditional check.

"help": "Attention mode for speculative decoding. E.g., 'full', 'sparse'."
},
)
enable_multi_layer_eagle: bool = False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency with other configuration fields in this dataclass, enable_multi_layer_eagle should be defined using field(). This also provides an opportunity to add a help string in the metadata for better documentation and discoverability through CLI help messages.

    enable_multi_layer_eagle: bool = field(
        default=False,
        metadata={"help": "Enable multi-layer EAGLE for speculative decoding."},
    )

Comment on lines +192 to +193
hf_remainder = hf_remainder.replace("enorm.weight", "enorm.weight")
hf_remainder = hf_remainder.replace("hnorm.weight", "hnorm.weight")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

These lines are no-ops as they replace a string with itself. They should either be corrected to perform the intended name mapping as suggested by the comment on line 191, or removed if they are not needed.

Comment on lines +136 to +140
accept_rate = (
resp.spec_accept_token_num / resp.spec_draft_token_num
if resp.spec_draft_token_num > 0
else 0.0
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The check if resp.spec_draft_token_num > 0 is redundant here, as it's already guaranteed by the outer if condition on line 135. You can simplify this to a direct division.

            accept_rate = resp.spec_accept_token_num / resp.spec_draft_token_num

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant