Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning Paper • 2503.07572 • Published 3 days ago • 31
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL Paper • 2503.07536 • Published 3 days ago • 71
Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts Paper • 2503.05447 • Published 7 days ago • 7
Forgetting Transformer: Softmax Attention with a Forget Gate Paper • 2503.02130 • Published 10 days ago • 26
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization Paper • 2503.04598 • Published 8 days ago • 17
PLDR-LLMs Learn A Generalizable Tensor Operator That Can Replace Its Own Deep Neural Net At Inference Paper • 2502.13502 • Published 23 days ago • 3
Liger: Linearizing Large Language Models to Gated Recurrent Structures Paper • 2503.01496 • Published 11 days ago • 15
Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs Paper • 2503.01307 • Published 11 days ago • 31
Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models Paper • 2502.15499 • Published 21 days ago • 13
Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam Paper • 2502.17055 • Published 18 days ago • 16
AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via GRPO Paper • 2502.14669 • Published 22 days ago • 11
Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities? Paper • 2502.12215 • Published 25 days ago • 16