Understanding R1-Zero-Like Training: A Critical Perspective Paper • 2503.20783 • Published 15 days ago • 36
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models Paper • 2503.09573 • Published 29 days ago • 68
Frac-Connections: Fractional Extension of Hyper-Connections Paper • 2503.14125 • Published 24 days ago • 19 • 4