SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published 14 days ago • 164
LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning Paper • 2503.04812 • Published Mar 4 • 14
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models Paper • 2402.03300 • Published Feb 5, 2024 • 116
Cohere Labs Aya Vision Collection Aya Vision is a state-of-the-art family of vision models that brings multimodal capabilities to 23 languages. • 5 items • Updated 6 days ago • 68
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4 • 226
Sana Collection ⚡️Sana: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer • 21 items • Updated 5 days ago • 90
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents Paper • 2410.10594 • Published Oct 14, 2024 • 27
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning Paper • 2409.20566 • Published Sep 30, 2024 • 57
CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling Paper • 2409.19291 • Published Sep 28, 2024 • 19
MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines Paper • 2409.12959 • Published Sep 19, 2024 • 38