VLMS
updated
PsiPi/liuhaotian_llava-v1.5-13b-GGUF
Image-Text-to-Text
• 13B • Updated • 638
• 37
Image-to-Text
• Updated • 27
Image-Text-to-Text
• 1B • Updated • 1.44k
• 57
ViGoR: Improving Visual Grounding of Large Vision Language Models with
Fine-Grained Reward Modeling
Paper
• 2402.06118
• Published • 15
LEGO:Language Enhanced Multi-modal Grounding Model
Paper
• 2401.06071
• Published • 12
Mini-Gemini: Mining the Potential of Multi-modality Vision Language
Models
Paper
• 2403.18814
• Published • 48
Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal
Language Models
Paper
• 2403.16999
• Published • 5
Salesforce/instructblip-vicuna-7b
Image-Text-to-Text
• 8B • Updated • 12.1k
• 99
Pegasus-v1 Technical Report
Paper
• 2404.14687
• Published • 33
List Items One by One: A New Data Source and Learning Paradigm for
Multimodal LLMs
Paper
• 2404.16375
• Published • 18
Needle In A Multimodal Haystack
Paper
• 2406.07230
• Published • 55