VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos Paper • 2411.04923 • Published 8 days ago • 20
CAMEL-Bench: A Comprehensive Arabic LMM Benchmark Paper • 2410.18976 • Published 22 days ago • 8