LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM Paper • 2503.04724 • Published 3 days ago • 47
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference Paper • 2502.18411 • Published 12 days ago • 69
Slamming: Training a Speech Language Model on One GPU in a Day Paper • 2502.15814 • Published 18 days ago • 66
Soundwave: Less is More for Speech-Text Alignment in LLMs Paper • 2502.12900 • Published 19 days ago • 76
AuraFusion360: Augmented Unseen Region Alignment for Reference-based 360° Unbounded Scene Inpainting Paper • 2502.05176 • Published 30 days ago • 32
Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis Paper • 2502.04128 • Published Feb 6 • 24
view article Article Recipe: Preparing Multilingual Speech Datasets for TTS Training By PHBJT and 1 other • Nov 4, 2024 • 18
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction Paper • 2501.06282 • Published Jan 10 • 48
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching Paper • 2410.06885 • Published Oct 9, 2024 • 44