Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation Paper • 2412.04432 • Published 20 days ago • 14
Moto: Latent Motion Token as the Bridging Language for Robot Manipulation Paper • 2412.04445 • Published 20 days ago • 21
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation Paper • 2307.06942 • Published Jul 13, 2023 • 22