VIRL-L-Init

This model serves as a initial checkpoint to reproduce results in paper SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training.

Related links

Website: https://tianzhechu.com/SFTvsRL/

Github: https://github.com/LeslieTrue/SFTvsRL

Arxiv: https://arxiv.org/abs/2501.17161v1

HF: https://huggingface.co/papers/2501.17161

Downloads last month
13
Safetensors
Model size
10.7B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Collection including tianzhechu/VIRL-L-Init