MoCha: Towards Movie-Grade Talking Character Synthesis Paper • 2503.23307 • Published 16 days ago • 120
TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes Paper • 2503.23461 • Published 16 days ago • 93
When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning Paper • 2504.01005 • Published 14 days ago • 15
GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors Paper • 2504.01016 • Published 14 days ago • 28
Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1 Paper • 2503.24376 • Published 15 days ago • 37
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization Paper • 2504.00999 • Published 14 days ago • 78
Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models Paper • 2503.22879 • Published 18 days ago • 10
DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance Paper • 2504.01724 • Published 13 days ago • 61
AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction Paper • 2504.01014 • Published 14 days ago • 59
ShortV: Efficient Multimodal Large Language Models by Freezing Visual Tokens in Ineffective Layers Paper • 2504.00502 • Published 14 days ago • 21
ZClip: Adaptive Spike Mitigation for LLM Pre-Training Paper • 2504.02507 • Published 12 days ago • 74
Inference-Time Scaling for Generalist Reward Modeling Paper • 2504.02495 • Published 12 days ago • 52