SimpleRL Collection The collection for the Project "Simple Reinforcement Learning for Reasoning" • 2 items • Updated 19 days ago • 5
view article Article Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment By NormalUhr • 26 days ago • 10
Running 2.14k 2.14k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters