VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models Paper • 2504.15279 • Published 4 days ago • 61
Iterative Self-Training for Code Generation via Reinforced Re-Ranking Paper • 2504.09643 • Published 12 days ago • 34
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning Paper • 2504.06958 • Published 16 days ago • 10
Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models Paper • 2504.07951 • Published 15 days ago • 27
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning Paper • 2504.07128 • Published 24 days ago • 82
FreSca: Unveiling the Scaling Space in Diffusion Models Paper • 2504.02154 • Published 23 days ago • 18
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step Paper • 2504.01956 • Published 23 days ago • 40
GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation Paper • 2504.02782 • Published 22 days ago • 56
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems Paper • 2504.01990 • Published 25 days ago • 258
ZClip: Adaptive Spike Mitigation for LLM Pre-Training Paper • 2504.02507 • Published 22 days ago • 76
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper • 2412.13663 • Published Dec 18, 2024 • 149
Token-Efficient Long Video Understanding for Multimodal LLMs Paper • 2503.04130 • Published Mar 6 • 94
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers Paper • 2502.15007 • Published Feb 20 • 175
SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference Paper • 2502.18137 • Published Feb 25 • 57
Multilingual Machine Translation with Open Large Language Models at Practical Scale: An Empirical Study Paper • 2502.02481 • Published Feb 4 • 13