Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization Paper • 2404.09956 • Published Apr 15 • 11
Joint Audio and Symbolic Conditioning for Temporally Controlled Text-to-Music Generation Paper • 2406.10970 • Published Jun 16 • 1
EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer Paper • 2409.10819 • Published 18 days ago • 17
Seed-Music: A Unified Framework for High Quality and Controlled Music Generation Paper • 2409.09214 • Published 21 days ago • 44
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions Paper • 2409.18042 • Published 9 days ago • 33