Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2404.16790

Vision Language Models Papers 🖼️💬📝

Papers about vision-language models, most important ones are on top of the list.

Improved Baselines with Visual Instruction Tuning

Paper • 2310.03744 • Published Oct 5, 2023 • 36
DeepSeek-VL: Towards Real-World Vision-Language Understanding

Paper • 2403.05525 • Published Mar 8 • 39
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities

Paper • 2308.12966 • Published Aug 24, 2023 • 6
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model

Paper • 2404.01331 • Published Mar 29 • 24

Papers - Tencent

Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs

Paper • 2312.17080 • Published Dec 28, 2023 • 1
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

Paper • 2404.12253 • Published Apr 18 • 53
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension

Paper • 2404.16790 • Published Apr 25 • 7
A Thorough Examination of Decoding Methods in the Era of LLMs

Paper • 2402.06925 • Published Feb 10 • 1

Papers - Benchmarks

The FinBen: An Holistic Financial Benchmark for Large Language Models

Paper • 2402.12659 • Published Feb 20 • 16
Long-context LLMs Struggle with Long In-context Learning

Paper • 2404.02060 • Published Apr 2 • 34
Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs

Paper • 2312.17080 • Published Dec 28, 2023 • 1
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

Paper • 1804.07461 • Published Apr 20, 2018 • 4

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14 • 124
OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents

Paper • 2306.16527 • Published Jun 21, 2023 • 47
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models

Paper • 2404.12387 • Published Apr 18 • 38
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension

Paper • 2404.16790 • Published Apr 25 • 7

FAX: Scalable and Differentiable Federated Primitives in JAX

Paper • 2403.07128 • Published Mar 11 • 11
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding

Paper • 2403.12895 • Published Mar 19 • 29
Measuring Style Similarity in Diffusion Models

Paper • 2404.01292 • Published Apr 1 • 16
JetMoE: Reaching Llama2 Performance with 0.1M Dollars

Paper • 2404.07413 • Published Apr 11 • 36

TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones

Paper • 2312.16862 • Published Dec 28, 2023 • 30
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action

Paper • 2312.17172 • Published Dec 28, 2023 • 26
Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers

Paper • 2401.01974 • Published Jan 3 • 5
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations

Paper • 2401.01885 • Published Jan 3 • 27

Previous
1
2
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs