Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation Paper • 2412.09428 • Published 14 days ago • 7
VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection Paper • 2411.14794 • Published Nov 22 • 12
Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT Paper • 2406.18583 • Published Jun 5
Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers Paper • 2405.05945 • Published May 9 • 2
Video Background Music Generation: Dataset, Method and Evaluation Paper • 2211.11248 • Published Nov 21, 2022
LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT Paper • 2306.17103 • Published Jun 29, 2023 • 1
LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation Paper • 2408.15881 • Published Aug 28 • 21
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining Paper • 2408.02657 • Published Aug 5 • 33