-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 26 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 13 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 43 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 22
Collections
Discover the best community collections!
Collections including paper arxiv:2501.12599
-
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper • 2502.02737 • Published • 203 -
Demystifying Long Chain-of-Thought Reasoning in LLMs
Paper • 2502.03373 • Published • 56 -
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Paper • 2501.12599 • Published • 103 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 108
-
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs
Paper • 2501.06186 • Published • 61 -
Virgo: A Preliminary Exploration on Reproducing o1-like MLLM
Paper • 2501.01904 • Published • 32 -
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Paper • 2501.12599 • Published • 103 -
Visual-RFT: Visual Reinforcement Fine-Tuning
Paper • 2503.01785 • Published • 64
-
Qwen Technical Report
Paper • 2309.16609 • Published • 35 -
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Paper • 2311.07919 • Published • 10 -
Qwen2 Technical Report
Paper • 2407.10671 • Published • 161 -
Qwen2-Audio Technical Report
Paper • 2407.10759 • Published • 57
-
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Paper • 2501.12599 • Published • 103 -
Teaching Language Models to Critique via Reinforcement Learning
Paper • 2502.03492 • Published • 24 -
NatureLM: Deciphering the Language of Nature for Scientific Discovery
Paper • 2502.07527 • Published • 19 -
MetaChain: A Fully-Automated and Zero-Code Framework for LLM Agents
Paper • 2502.05957 • Published • 16
-
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 276 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 346 -
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 100 -
The Lessons of Developing Process Reward Models in Mathematical Reasoning
Paper • 2501.07301 • Published • 92
-
Cosmos World Foundation Model Platform for Physical AI
Paper • 2501.03575 • Published • 69 -
Phi-4 Technical Report
Paper • 2412.08905 • Published • 111 -
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 276 -
DeepSeek-V3 Technical Report
Paper • 2412.19437 • Published • 55