Skip to content

Separate SFT+RL from RL only #17

@rawsh-rubrik

Description

@rawsh-rubrik

Hi!

I think that models trained with distillation followed by reinforcement learning, or multiple distillation steps, should have a separate section (or at least an “SFT + RL” indication).

For instance for Qwen2.5-Math-7B, the Sky-T1-7B model is trained with 4-step SFT->RL->SFT->RL vs Oat-Zero and others which are trained in a single RL run.

Wdyt?

(aside: I think the leaderboard would benefit greatly from a verified / unverified section like SWE-bench so that new releases and be added and compared quickly. It would need an easy way to run the full pipeline locally, but I think this would be very useful to the community.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions