Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes Paper • 2603.25562 • Published Mar 26 • 19
DRAGON: Distributional Rewards Optimize Diffusion Generative Models Paper • 2504.15217 • Published Apr 21, 2025 • 11
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning Paper • 2510.25992 • Published Oct 29, 2025 • 48