Mixture-of-Mamba: Enhancing Multi-Modal State-Space Models with Modality-Aware Sparsity Paper • 2501.16295 • Published 7 days ago • 7
Return of the Encoder: Maximizing Parameter Efficiency for SLMs Paper • 2501.16273 • Published 7 days ago • 5
ARWKV: Pretrain is not what we need, an RNN-Attention-Based Language Model Born from Transformer Paper • 2501.15570 • Published 8 days ago • 21
Towards General-Purpose Model-Free Reinforcement Learning Paper • 2501.16142 • Published 7 days ago • 23
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs Paper • 2501.18585 • Published 4 days ago • 39
PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding Paper • 2501.16411 • Published 7 days ago • 17
Large Language Models Think Too Fast To Explore Effectively Paper • 2501.18009 • Published 5 days ago • 19
Early External Safety Testing of OpenAI's o3-mini: Insights from the Pre-Deployment Evaluation Paper • 2501.17749 • Published 5 days ago • 11
Any2AnyTryon: Leveraging Adaptive Position Embeddings for Versatile Virtual Clothing Tasks Paper • 2501.15891 • Published 7 days ago • 10
DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation Paper • 2501.16764 • Published 6 days ago • 18
Optimizing Large Language Model Training Using FP4 Quantization Paper • 2501.17116 • Published 6 days ago • 29
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published 6 days ago • 88
CodeMonkeys: Scaling Test-Time Compute for Software Engineering Paper • 2501.14723 • Published 10 days ago • 7
Evolution and The Knightian Blindspot of Machine Learning Paper • 2501.13075 • Published 12 days ago • 6