LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model Paper • 2404.01331 • Published Mar 29 • 25
LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models Paper • 2404.03118 • Published Apr 3 • 23
DesignQA: A Multimodal Benchmark for Evaluating Large Language Models' Understanding of Engineering Documentation Paper • 2404.07917 • Published Apr 11 • 1
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models Paper • 2404.07973 • Published Apr 11 • 30
MagicQuill: An Intelligent Interactive Image Editing System Paper • 2411.09703 • Published Nov 14 • 57