ChatRex: Taming Multimodal LLM for Joint Perception and Understanding Paper • 2411.18363 • Published Nov 27, 2024 • 10
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection Paper • 2405.10300 • Published May 16, 2024 • 28
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection Paper • 2405.10300 • Published May 16, 2024 • 28
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection Paper • 2303.05499 • Published Mar 9, 2023
Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models Paper • 2305.15023 • Published May 24, 2023
T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy Paper • 2403.14610 • Published Mar 21, 2024 • 3
Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks Paper • 2401.14159 • Published Jan 25, 2024 • 1
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models Paper • 2312.02949 • Published Dec 5, 2023 • 12
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents Paper • 2311.05437 • Published Nov 9, 2023 • 48