LI
RogerZhuo
AI & ML interests
None yet
Recent Activity
upvoted a collection about 1 month ago
DFlash liked a model about 1 month ago
z-lab/gemma-4-31B-it-DFlash liked a model about 1 month ago
baidu/ERNIE-ImageOrganizations
Reading
Music
-
ElectricAlexis/NotaGen
Updated • 154 -
ASLP-lab/LLaSE-G1
Audio-to-Audio • Updated • 28 - Running on ZeroAgentsFeatured688
Di♪♪Rhythm
🎶688Blazingly Fast and Embarrassingly Simple Song Generation
-
DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion
Paper • 2503.01183 • Published • 29
AI Arena
I2V
image-to-video
-
Wan-AI/Wan2.1-T2V-1.3B
Text-to-Video • Updated • 40.7k • • 458 -
VBench: Comprehensive Benchmark Suite for Video Generative Models
Paper • 2311.17982 • Published • 9 -
VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models
Paper • 2411.13503 • Published • 34 -
tencent/HunyuanVideo-I2V
Image-to-Video • Updated • 335 • • 352
LLM
基础大模型相关
must-read-papers
AI Papers
-
Reinforcement Learning: An Overview
Paper • 2412.05265 • Published • 8 -
Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis
Paper • 2411.01156 • Published • 13 -
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness
Paper • 2503.21755 • Published • 33 -
Qwen2.5-Omni Technical Report
Paper • 2503.20215 • Published • 173
OCR
images
images
-
black-forest-labs/FLUX.1-dev
Text-to-Image • Updated • 958k • • 13.1k -
cagliostrolab/animagine-xl-4.0
Text-to-Image • Updated • 265k • 428 - Runtime errorAgentsFeatured283
Thera Arbitrary-Scale Super-Resolution
🔥283Upscale photos to any size with neural super‑resolution
-
stepfun-ai/Step1X-Edit
Image-to-Image • Updated • 126 • 332
TTS
语音相关
-
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design
Paper • 2307.16430 • Published • 4 -
Zyphra/Zonos-v0.1-transformer
Text-to-Speech • 2B • Updated • 9.68k • 434 -
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
Paper • 2502.05512 • Published • 7 -
Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction
Paper • 2502.11946 • Published • 3
virtual try-on
虚拟换妆
-
Learning Flow Fields in Attention for Controllable Person Image Generation
Paper • 2412.08486 • Published • 36 -
franciszzj/Leffa
Image-to-Image • Updated • 345 -
TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models
Paper • 2411.18350 • Published • 28 - Running on ZeroAgents64
TryOffDiff
🔥64Extract garment images from everyday images!
Data
must-read-papers
Reading
AI Papers
-
Reinforcement Learning: An Overview
Paper • 2412.05265 • Published • 8 -
Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis
Paper • 2411.01156 • Published • 13 -
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness
Paper • 2503.21755 • Published • 33 -
Qwen2.5-Omni Technical Report
Paper • 2503.20215 • Published • 173
Music
-
ElectricAlexis/NotaGen
Updated • 154 -
ASLP-lab/LLaSE-G1
Audio-to-Audio • Updated • 28 - Running on ZeroAgentsFeatured688
Di♪♪Rhythm
🎶688Blazingly Fast and Embarrassingly Simple Song Generation
-
DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion
Paper • 2503.01183 • Published • 29
OCR
AI Arena
images
images
-
black-forest-labs/FLUX.1-dev
Text-to-Image • Updated • 958k • • 13.1k -
cagliostrolab/animagine-xl-4.0
Text-to-Image • Updated • 265k • 428 - Runtime errorAgentsFeatured283
Thera Arbitrary-Scale Super-Resolution
🔥283Upscale photos to any size with neural super‑resolution
-
stepfun-ai/Step1X-Edit
Image-to-Image • Updated • 126 • 332
I2V
image-to-video
-
Wan-AI/Wan2.1-T2V-1.3B
Text-to-Video • Updated • 40.7k • • 458 -
VBench: Comprehensive Benchmark Suite for Video Generative Models
Paper • 2311.17982 • Published • 9 -
VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models
Paper • 2411.13503 • Published • 34 -
tencent/HunyuanVideo-I2V
Image-to-Video • Updated • 335 • • 352
TTS
语音相关
-
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design
Paper • 2307.16430 • Published • 4 -
Zyphra/Zonos-v0.1-transformer
Text-to-Speech • 2B • Updated • 9.68k • 434 -
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
Paper • 2502.05512 • Published • 7 -
Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction
Paper • 2502.11946 • Published • 3
LLM
基础大模型相关
virtual try-on
虚拟换妆
-
Learning Flow Fields in Attention for Controllable Person Image Generation
Paper • 2412.08486 • Published • 36 -
franciszzj/Leffa
Image-to-Image • Updated • 345 -
TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models
Paper • 2411.18350 • Published • 28 - Running on ZeroAgents64
TryOffDiff
🔥64Extract garment images from everyday images!