Towards General-Purpose Model-Free Reinforcement Learning Paper • 2501.16142 • Published 6 days ago • 22
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published 5 days ago • 76
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization Paper • 2412.12098 • Published Dec 16, 2024 • 4
RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning Paper • 2412.09858 • Published Dec 13, 2024 • 1
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs Paper • 2501.18585 • Published 3 days ago • 34