-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 26 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 13 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 43 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 22
Collections
Discover the best community collections!
Collections including paper arxiv:2503.00808
-
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs
Paper • 2503.01743 • Published • 64 -
Phi-4 Technical Report
Paper • 2412.08905 • Published • 111 -
Visual-RFT: Visual Reinforcement Fine-Tuning
Paper • 2503.01785 • Published • 57 -
When an LLM is apprehensive about its answers -- and when its uncertainty is justified
Paper • 2503.01688 • Published • 19
-
Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models
Paper • 2502.15499 • Published • 13 -
MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs
Paper • 2502.17422 • Published • 7 -
The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?
Paper • 2502.17535 • Published • 8 -
Scaling LLM Pre-training with Vocabulary Curriculum
Paper • 2502.17910 • Published • 1
-
Steel-LLM:From Scratch to Open Source -- A Personal Journey in Building a Chinese-Centric LLM
Paper • 2502.06635 • Published • 4 -
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper • 2502.02737 • Published • 199 -
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 92 -
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 274
-
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models
Paper • 2412.11605 • Published • 18 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 93 -
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization
Paper • 2412.17739 • Published • 41 -
SKETCH: Structured Knowledge Enhanced Text Comprehension for Holistic Retrieval
Paper • 2412.15443 • Published • 9
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 23 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 83 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 147 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 147 -
Orion-14B: Open-source Multilingual Large Language Models
Paper • 2401.12246 • Published • 13 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 56 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 47