Separate SFT+RL from RL only

Hi! 

I think that models trained with distillation followed by reinforcement learning, or multiple distillation steps, should have a separate section (or at least an “SFT + RL” indication).

For instance for Qwen2.5-Math-7B, the Sky-T1-7B model is trained with 4-step SFT->RL->SFT->RL vs Oat-Zero and others which are trained in a single RL run.

Wdyt?

(aside: I think the leaderboard would benefit greatly from a verified / unverified section like SWE-bench so that new releases and be added and compared quickly. It would need an easy way to run the full pipeline locally, but I think this would be very useful to the community.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate SFT+RL from RL only #17

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Separate SFT+RL from RL only #17

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions