-
Improved Baselines with Visual Instruction Tuning
Paper • 2310.03744 • Published • 36 -
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Paper • 2403.05525 • Published • 39 -
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities
Paper • 2308.12966 • Published • 6 -
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model
Paper • 2404.01331 • Published • 24
Collections
Discover the best community collections!
Collections including paper arxiv:2404.16790
-
Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs
Paper • 2312.17080 • Published • 1 -
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
Paper • 2404.12253 • Published • 53 -
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension
Paper • 2404.16790 • Published • 7 -
A Thorough Examination of Decoding Methods in the Era of LLMs
Paper • 2402.06925 • Published • 1
-
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper • 2402.12659 • Published • 16 -
Long-context LLMs Struggle with Long In-context Learning
Paper • 2404.02060 • Published • 34 -
Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs
Paper • 2312.17080 • Published • 1 -
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Paper • 1804.07461 • Published • 4
-
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 124 -
OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents
Paper • 2306.16527 • Published • 47 -
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models
Paper • 2404.12387 • Published • 38 -
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension
Paper • 2404.16790 • Published • 7
-
FAX: Scalable and Differentiable Federated Primitives in JAX
Paper • 2403.07128 • Published • 11 -
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
Paper • 2403.12895 • Published • 29 -
Measuring Style Similarity in Diffusion Models
Paper • 2404.01292 • Published • 16 -
JetMoE: Reaching Llama2 Performance with 0.1M Dollars
Paper • 2404.07413 • Published • 36
-
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
Paper • 2312.16862 • Published • 30 -
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
Paper • 2312.17172 • Published • 26 -
Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers
Paper • 2401.01974 • Published • 5 -
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations
Paper • 2401.01885 • Published • 27