SFTvsRL Models & Data
Collection
This collection contains 4 initial checkpoints for https://github.com/LeslieTrue/SFTvsRL and necessary data for V-IRL training.
•
5 items
•
Updated
•
6
This model serves as a initial checkpoint to reproduce results in paper SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training.
Website: https://tianzhechu.com/SFTvsRL/
Github: https://github.com/LeslieTrue/SFTvsRL