vlms - a cooleel Collection

cooleel 's Collections

Agent

vlms

DocAI

vlms

updated 5 days ago

Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages

Paper • 2410.16153 • Published 26 days ago • 42
AutoTrain: No-code training for state-of-the-art models

Paper • 2410.15735 • Published 26 days ago • 56
The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio

Paper • 2410.12787 • Published about 1 month ago • 30
LEOPARD : A Vision Language Model For Text-Rich Multi-Image Tasks

Paper • 2410.01744 • Published Oct 2 • 25
UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models

Paper • 2410.14059 • Published 29 days ago • 52
NVLM: Open Frontier-Class Multimodal LLMs

Paper • 2409.11402 • Published Sep 17 • 71
MIO: A Foundation Model on Multimodal Tokens

Paper • 2409.17692 • Published Sep 26 • 49
Emu3: Next-Token Prediction is All You Need

Paper • 2409.18869 • Published Sep 27 • 90
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Paper • 2409.17146 • Published Sep 25 • 101
Analyzing The Language of Visual Tokens

Paper • 2411.05001 • Published 9 days ago • 19
Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination

Paper • 2411.03823 • Published 10 days ago • 43