Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis Paper • 2411.01156 • Published Nov 2, 2024 • 6
Wan: Open and Advanced Large-Scale Video Generative Models Paper • 2503.20314 • Published 10 days ago • 46
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models Paper • 2406.02430 • Published Jun 4, 2024 • 36
WritingBench: A Comprehensive Benchmark for Generative Writing Paper • 2503.05244 • Published 29 days ago • 17
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models Paper • 2306.07691 • Published Jun 13, 2023 • 8
VBench: Comprehensive Benchmark Suite for Video Generative Models Paper • 2311.17982 • Published Nov 29, 2023 • 9
DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion Paper • 2503.01183 • Published Mar 3 • 26
Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction Paper • 2502.11946 • Published Feb 17 • 2
Step-Audio Collection Step-Audio model family, including Audio-Tokenizer, Audio-Chat and TTS • 3 items • Updated Feb 17 • 30
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System Paper • 2502.05512 • Published Feb 8 • 2
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design Paper • 2307.16430 • Published Jul 31, 2023 • 4
Learning Flow Fields in Attention for Controllable Person Image Generation Paper • 2412.08486 • Published Dec 11, 2024 • 36
TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models Paper • 2411.18350 • Published Nov 27, 2024 • 28