VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding Paper • 2501.13106 • Published 14 days ago • 79
Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding Paper • 2311.16922 • Published Nov 28, 2023 • 1
MS-DETR: Natural Language Video Localization with Sampling Moment-Moment Interaction Paper • 2305.18969 • Published May 30, 2023