SmolLM Variation: PPO & DPO Fine-Tuning for RLHF Collection This collection presents the fine-tuning of the SmolLM model using two (RLHF) approaches: DPO and PPO. • 3 items • Updated 25 days ago • 1