Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models Paper • 2501.01423 • Published 26 days ago • 36
Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis Paper • 2412.15322 • Published Dec 19, 2024 • 18
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published Dec 13, 2024 • 89
Tracking Anything with Decoupled Video Segmentation Paper • 2309.03903 • Published Sep 7, 2023 • 27