VLM - a dipta007 Collection

dipta007 's Collections

open-r1-resources

scify

Small Multimodal Models

Research-Helpers

LLM to annotate Dataset

MediQA

VLM

Multimodal Dataset

Efficient Training

RLHF

VLM

updated Nov 10, 2024

MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer

Paper • 2401.10208 • Published Jan 18, 2024 • 1
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

Paper • 2305.11172 • Published May 18, 2023 • 1
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video

Paper • 2302.00402 • Published Feb 1, 2023
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities

Paper • 2308.12966 • Published Aug 24, 2023 • 8
Unified Model for Image, Video, Audio and Language Tasks

Paper • 2307.16184 • Published Jul 30, 2023 • 15
Foundational Models Defining a New Era in Vision: A Survey and Outlook

Paper • 2307.13721 • Published Jul 25, 2023
InstructDiffusion: A Generalist Modeling Interface for Vision Tasks

Paper • 2309.03895 • Published Sep 7, 2023 • 14
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

Paper • 2312.14238 • Published Dec 21, 2023 • 20
MMBench: Is Your Multi-modal Model an All-around Player?

Paper • 2307.06281 • Published Jul 12, 2023 • 5
GPT4All: An Ecosystem of Open Source Compressed Language Models

Paper • 2311.04931 • Published Nov 6, 2023 • 23
DeepSeek-VL: Towards Real-World Vision-Language Understanding

Paper • 2403.05525 • Published Mar 8, 2024 • 43
nvidia/NVLM-D-72B

Image-Text-to-Text • Updated Jan 14 • 19.2k • 764
Qwen/Qwen2-VL-72B-Instruct-AWQ

Image-Text-to-Text • Updated Sep 25, 2024 • 42.1k • 48
Qwen/Qwen2-VL-7B-Instruct

Image-Text-to-Text • Updated 28 days ago • 1.38M • • 1.15k
Qwen/Qwen2-VL-72B-Instruct

Image-Text-to-Text • Updated 28 days ago • 168k • • 280
HuggingFaceM4/Idefics3-8B-Llama3

Image-Text-to-Text • Updated Dec 2, 2024 • 43.1k • 271
mistralai/Pixtral-12B-2409

Image-Text-to-Text • Updated Dec 26, 2024 • • 616
OpenGVLab/InternVL2-8B

Image-Text-to-Text • Updated 29 days ago • 63.9k • 168
OpenGVLab/InternVL2-4B

Image-Text-to-Text • Updated 1 day ago • 18.5k • 50
OpenGVLab/InternVL2-Llama3-76B

Image-Text-to-Text • Updated 29 days ago • 26.5k • 213