RL - a Andynsn Collection

Andynsn 's Collections

RL

updated about 7 hours ago

RL

Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes

Paper • 2603.25562 • Published Mar 26 • 19
Self-Distilled Agentic Reinforcement Learning

Paper • 2605.15155 • Published 6 days ago • 100
DRAGON: Distributional Rewards Optimize Diffusion Generative Models

Paper • 2504.15217 • Published Apr 21, 2025 • 11
Diffusion Policy Policy Optimization

Paper • 2409.00588 • Published Sep 1, 2024 • 20
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning

Paper • 2510.25992 • Published Oct 29, 2025 • 48