VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM Paper • 2501.00599 • Published 13 days ago • 40
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published about 1 month ago • 137
FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait Paper • 2412.01064 • Published Dec 2, 2024 • 26
SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints Paper • 2412.07760 • Published Dec 10, 2024 • 50
Open-Sora Plan: Open-Source Large Video Generation Model Paper • 2412.00131 • Published Nov 28, 2024 • 33
X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models Paper • 2412.01824 • Published Dec 2, 2024 • 65