LLaVA-Video Collection Models focus on video understanding (previously known as LLaVA-NeXT-Video). • 8 items • Updated 7 days ago • 60
view article Article Model2Vec: Distill a Small Fast Model from any Sentence Transformer By Pringled and 1 other • Oct 14, 2024 • 73
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 8 items • Updated 4 days ago • 379
Multimodal Models Collection Multimodal models with leading performance. • 17 items • Updated Jan 17 • 33
Molmo Collection Artifacts for open multimodal language models. • 5 items • Updated 18 days ago • 297
AIMv2 Collection A collection of AIMv2 vision encoders that supports a number of resolutions, native resolution, and a distilled checkpoint. • 19 items • Updated Nov 22, 2024 • 74
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper • 2409.17146 • Published Sep 25, 2024 • 108
Llama 3.2 Collection This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 • 15 items • Updated Dec 6, 2024 • 571
LLaVa-1.5 Collection LLaVa-1.5 is a series of vision-language models (VLMs) trained on a variety of visual instruction datasets. • 3 items • Updated Mar 18, 2024 • 8
LLaVa-NeXT Collection LLaVa-NeXT (also known as LLaVa-1.6) improves upon the 1.5 series by incorporating higher image resolutions and more reasoning/OCR datasets. • 8 items • Updated Jul 19, 2024 • 29
Vision-Language Modeling Collection Our datasets and models for Visual-Language Modeling • 5 items • Updated Nov 25, 2024 • 6
CogVLM2 Collection This collection hosts the repos of the THUDM's CogVLM2 releases • 8 items • Updated 16 days ago • 19