Benchmark Designers Should "Train on the Test Set" to Expose Exploitable Non-Visual Shortcuts Paper • 2511.04655 • Published 14 days ago • 7
COLA: How to adapt vision-language models to Compose Objects Localized with Attributes? Paper • 2305.03689 • Published May 5, 2023 • 3
SIMS-V: Simulated Instruction-Tuning for Spatial Video Understanding Paper • 2511.04668 • Published 14 days ago • 4
SAT: Dynamic Spatial Aptitude Training for Multimodal Language Models Paper • 2412.07755 • Published Dec 10, 2024 • 2
Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model Paper • 2408.00754 • Published Aug 1, 2024 • 24