mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality Paper • 2304.14178 • Published Apr 27, 2023 • 3
Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs Paper • 2403.12596 • Published Mar 19 • 9
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images Paper • 2403.11703 • Published Mar 18 • 16
GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering Paper • 1902.09506 • Published Feb 25, 2019 • 2
Lexicon-Level Contrastive Visual-Grounding Improves Language Modeling Paper • 2403.14551 • Published Mar 21 • 2
Prompt me a Dataset: An investigation of text-image prompting for historical image dataset creation using foundation models Paper • 2309.01674 • Published Sep 4, 2023 • 2
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models Paper • 2404.07973 • Published Apr 11 • 30
RegionGPT: Towards Region Understanding Vision Language Model Paper • 2403.02330 • Published Mar 4 • 2
TextSquare: Scaling up Text-Centric Visual Instruction Tuning Paper • 2404.12803 • Published Apr 19 • 29