Unified Multi-Modal Interleaved Document Representation for Information Retrieval Paper • 2410.02729 • Published Oct 3, 2024
TAMP: Token-Adaptive Layerwise Pruning in Multimodal Large Language Models Paper • 2504.09897 • Published Apr 14, 2025
PRInTS: Reward Modeling for Long-Horizon Information Seeking Paper • 2511.19314 • Published Nov 24, 2025 • 8
STELLA: Continual Audio-Video Pre-training with Spatio-Temporal Localized Alignment Paper • 2310.08204 • Published Oct 12, 2023
A History-Aware Visually Grounded Critic for Computer Use Agents Paper • 2606.11078 • Published 3 days ago • 2
StreamGaze: Gaze-Guided Temporal Reasoning and Proactive Understanding in Streaming Videos Paper • 2512.01707 • Published Dec 1, 2025 • 8
PRInTS: Reward Modeling for Long-Horizon Information Seeking Paper • 2511.19314 • Published Nov 24, 2025 • 8
Video-Skill-CoT: Skill-based Chain-of-Thoughts for Domain-Adaptive Video Reasoning Paper • 2506.03525 • Published Jun 4, 2025 • 6
VideoRAG: Retrieval-Augmented Generation over Video Corpus Paper • 2501.05874 • Published Jan 10, 2025 • 75