Running 2.3k 2.3k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
allenai/RLVR-GSM-MATH-IF-Mixed-Constraints Viewer • Updated Nov 26, 2024 • 29.9k • 1.04k • 20
Teaching Language Models to Critique via Reinforcement Learning Paper • 2502.03492 • Published Feb 5 • 24
CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction Paper • 2502.07316 • Published Feb 11 • 47
UI Agent Collection a collection of algorithmic agents for user interfaces/interactions, program synthesis, and robots • 323 items • Updated about 6 hours ago • 48
Teaching Language Models to Critique via Reinforcement Learning Paper • 2502.03492 • Published Feb 5 • 24
Teaching Language Models to Critique via Reinforcement Learning Paper • 2502.03492 • Published Feb 5 • 24 • 2
nuprl/stack-dedup-python-testgen-starcoder-filter-v2 Viewer • Updated Feb 29, 2024 • 158k • 216 • 7