This&That: Language-Gesture Controlled Video Generation for Robot Planning Paper • 2407.05530 • Published Jul 8, 2024 • 4
SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement Paper • 2504.07934 • Published 14 days ago • 18
Community Forensics: Using Thousands of Generators to Train Fake Image Detectors Paper • 2411.04125 • Published Nov 6, 2024 • 1
Improving Vision-and-Language Navigation with Image-Text Pairs from the Web Paper • 2004.14973 • Published Apr 30, 2020
Self-Supervised Any-Point Tracking by Contrastive Random Walks Paper • 2409.16288 • Published Sep 24, 2024 • 7
EXIF as Language: Learning Cross-Modal Associations Between Images and Camera Metadata Paper • 2301.04647 • Published Jan 11, 2023
VISITRON: Visual Semantics-Aligned Interactively Trained Object-Navigator Paper • 2105.11589 • Published May 25, 2021
Sim-to-Real Transfer for Vision-and-Language Navigation Paper • 2011.03807 • Published Nov 7, 2020
Chasing Ghosts: Instruction Following as Bayesian State Tracking Paper • 1907.02022 • Published Jul 3, 2019
AVA-AVD: Audio-Visual Speaker Diarization in the Wild Paper • 2111.14448 • Published Nov 29, 2021
Self-Supervised Video Forensics by Audio-Visual Anomaly Detection Paper • 2301.01767 • Published Jan 4, 2023
Knowledge Solver: Teaching LLMs to Search for Domain Knowledge from Knowledge Graphs Paper • 2309.03118 • Published Sep 6, 2023 • 2
Vision-Flan: Scaling Human-Labeled Tasks in Visual Instruction Tuning Paper • 2402.11690 • Published Feb 18, 2024 • 10