LI

RogerZhuo

AI & ML interests

None yet

Recent Activity

updated a collection 5 days ago

TTS

liked a model 5 days ago

HKUSTAudio/Llasa-3B

updated a collection 6 days ago

OCR

View all activity

Organizations

RogerZhuo's activity

upvoted a paper 7 days ago

Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis

Paper • 2411.01156 • Published Nov 2, 2024 • 6

upvoted 3 papers 8 days ago

upvoted a paper 11 days ago

Reinforcement Learning: An Overview

Paper • 2412.05265 • Published Dec 6, 2024 • 5

upvoted a paper 13 days ago

Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

Paper • 2406.02430 • Published Jun 4, 2024 • 36

upvoted 2 papers 23 days ago

WritingBench: A Comprehensive Benchmark for Generative Writing

Paper • 2503.05244 • Published 29 days ago • 17

VACE: All-in-One Video Creation and Editing

Paper • 2503.07598 • Published 26 days ago • 43

upvoted a paper 27 days ago

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

Paper • 2306.07691 • Published Jun 13, 2023 • 8

upvoted 3 papers about 1 month ago

VBench: Comprehensive Benchmark Suite for Video Generative Models

Paper • 2311.17982 • Published Nov 29, 2023 • 9

DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion

Paper • 2503.01183 • Published Mar 3 • 26

Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction

Paper • 2502.11946 • Published Feb 17 • 2

upvoted a collection about 1 month ago

Step-Audio

Collection

Step-Audio model family, including Audio-Tokenizer, Audio-Chat and TTS • 3 items • Updated Feb 17 • 30

upvoted 2 papers about 1 month ago

IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System

Paper • 2502.05512 • Published Feb 8 • 2

VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design

Paper • 2307.16430 • Published Jul 31, 2023 • 4

upvoted a collection about 1 month ago

DeepSeek-V3

Collection

4 items • Updated 11 days ago • 232

upvoted 2 papers about 1 month ago

Learning Flow Fields in Attention for Controllable Person Image Generation

Paper • 2412.08486 • Published Dec 11, 2024 • 36

TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models

Paper • 2411.18350 • Published Nov 27, 2024 • 28