Skip to content

feat(engine): Add Karmarkar-Karp algorithm#1151

Open
TaoZex wants to merge 22 commits intoinclusionAI:mainfrom
TaoZex:kk
Open

feat(engine): Add Karmarkar-Karp algorithm#1151
TaoZex wants to merge 22 commits intoinclusionAI:mainfrom
TaoZex:kk

Conversation

@TaoZex
Copy link
Copy Markdown
Contributor

@TaoZex TaoZex commented Apr 8, 2026

Description

The core change adds a Karmarkar-Karp (Largest Differencing Method) partitioning algorithm as an alternative to the existing First Fit Decreasing (FFD) for sequence packing / micro-batch allocation. KK produces more balanced partitions (lower max-min spread across groups), which is especially beneficial for RL training workloads with highly variable sequence lengths and bimodal distributions.

The implementation is fully configurable: users can select "ffd" (default, backward-compatible) or "kk" via the new packing_algorithm field in MicroBatchSpec , either in YAML configs or programmatically.

Related Issue

Fixes #(issue)

Type of Change

  • 🐛 Bug fix
  • ✨ New feature
  • 💥 Breaking change
  • 📝 Documentation update
  • ♻️ Refactoring
  • ⚡ Performance improvement
  • ✅ Test coverage improvement

Checklist

  • I have read the Contributing Guide
  • Pre-commit hooks pass (pre-commit run --all-files)
  • Relevant tests pass; new tests added for new functionality
  • Documentation updated (if applicable; built with ./docs/build_all.sh)
  • Branch is up to date with main
  • Self-reviewed via /review-pr command
  • This PR was created by a coding agent via /create-pr
  • This PR is a breaking change

Breaking Change Details (if applicable):

Additional Context

  • seqpack.py
    Adds _KKSet , _KKState , _kk_partition (core KK Largest Differencing Method using a max-heap), kk_allocate (drop-in replacement for ffd_allocate with capacity safety net and FFD fallback), _compute_packing_metrics (diagnostic metrics: spread, CV, imbalance ratio, utilization, wasted tokens), and get_allocate_fn (algorithm registry / dispatch).

Need help? Check the Contributing Guide or ask in
GitHub Discussions!

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements the Karmarkar-Karp (KK) sequence packing algorithm to enhance load balancing during training, adding configuration support in MicroBatchSpec and updating redistribution logic. Review feedback identifies that the new tests rely on inline implementations rather than the production code, which undermines their validity. Other suggestions include removing redundant or unused imports, updating outdated comments, and addressing a potential logic gap where the FFD fallback in the KK implementation may not respect the equal_size constraint.

@TaoZex TaoZex changed the title feat(engine): Add Karmarkar-Karp algorithm [WIP]feat(engine): Add Karmarkar-Karp algorithm Apr 8, 2026
@TaoZex
Copy link
Copy Markdown
Contributor Author

TaoZex commented Apr 9, 2026

Training was conducted using the examples/math/gsm8k_grpo.yaml configuration (with 4-way data parallelism applied to both runs), and the metrics comparison between the KK and FFD algorithms is presented below:

KK vs. FFD over 100 training steps

Task reward

image

The task_reward value remains consistent across all training steps.

Train step time

image

Thanks to the superior load balancing performance of the KK algorithm, the long-tail latency within the same batch is reduced, resulting in a significant drop in the train_time metric.

KK vs. FFD Algorithm Average Metrics Comparison

Metric Definition KK Average FFD Average Lower is Better?
n_groups Number of groups 10.21 10.26
max_load Maximum group load 9677.78 10113.67
min_load Minimum group load 9658.08 5864.03
mean_load Average group load 9670.30 9624.87
spread max_load − min_load 19.70 4249.64
imbalance_ratio spread / mean_load 0.0020 0.4475
std_dev Standard deviation of inter-group load 8.54 1244.96
cv Coefficient of variation (std/mean) 0.00089 0.1311
time Algorithm execution time (seconds) 0.0086 0.0029

KK Improvement Magnitude vs. FFD

Metric Improvement Magnitude
spread −97.84% (load gap nearly eliminated)
std_dev −97.36%
cv −97.38%
max_load −4.27%
wasted_tokens −4.57%

The KK algorithm comprehensively outperforms FFD in load balancing, with improvements of over 97% for spread, std_dev, and cv.

Full training run using the KK algorithm

image

No abnormal fluctuations were observed in the reward values, which confirms that the algorithm supports stable training.

@TaoZex
Copy link
Copy Markdown
Contributor Author

TaoZex commented Apr 9, 2026

Test Content for Test Code

The following content covers the tests performed on the test code:

  1. test_kk_e2e.py
image
  1. test_kk_allocate.py
image
  1. run_kk_vs_ffd.py
image

@TaoZex TaoZex marked this pull request as ready for review April 9, 2026 06:23
@TaoZex
Copy link
Copy Markdown
Contributor Author

TaoZex commented Apr 9, 2026

/gemini review

@TaoZex TaoZex changed the title [WIP]feat(engine): Add Karmarkar-Karp algorithm feat(engine): Add Karmarkar-Karp algorithm Apr 9, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the Karmarkar-Karp (KK) algorithm as an alternative sequence packing method to the existing First Fit Decreasing (FFD) algorithm. The KK algorithm, implemented with new utility classes and functions in areal/utils/seqpack.py, aims to provide more balanced micro-batch allocation, especially beneficial for variable-length sequences in large-scale RL training. The MicroBatchSpec in areal/api/cli_args.py is updated to allow configuration of the packing algorithm, and its usage is integrated into areal/infra/dist_rollout.py and areal/utils/data.py. Comprehensive documentation has been added to explain the new algorithms and their use cases, and extensive unit and end-to-end tests confirm the correctness and benefits of KK. Feedback includes improving the readability of nested attribute access in dist_rollout.py, clarifying the time complexity of the KK algorithm in the documentation, and refining the simulation logic in run_kk_vs_ffd.py to better mirror production code.

@TaoZex
Copy link
Copy Markdown
Contributor Author

TaoZex commented Apr 13, 2026

@garrett4wade Hi, would you please help me review code when you have some free time? Thanks~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant