Running 2.24k 2.24k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
Teaching Language Models to Critique via Reinforcement Learning Paper • 2502.03492 • Published Feb 5 • 24
CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction Paper • 2502.07316 • Published about 1 month ago • 47
UI Agent Collection a collection of algorithmic agents for user interfaces/interactions, program synthesis, and robots • 314 items • Updated about 1 hour ago • 47
Teaching Language Models to Critique via Reinforcement Learning Paper • 2502.03492 • Published Feb 5 • 24
Teaching Language Models to Critique via Reinforcement Learning Paper • 2502.03492 • Published Feb 5 • 24 • 2
nuprl/stack-dedup-python-testgen-starcoder-filter-v2 Viewer • Updated Feb 29, 2024 • 158k • 240 • 7