Perceiver: General Perception with Iterative Attention Paper • 2103.03206 • Published Mar 4, 2021 • 1
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper • 2501.04519 • Published 4 days ago • 185
Cosmos World Foundation Model Platform for Physical AI Paper • 2501.03575 • Published 5 days ago • 54
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models Paper • 2501.03262 • Published 8 days ago • 72
High-Fidelity Audio Compression with Improved RVQGAN Paper • 2306.06546 • Published Jun 11, 2023 • 10
Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning Paper • 1907.04448 • Published Jul 9, 2019 • 1
SDPO: Segment-Level Direct Preference Optimization for Social Agents Paper • 2501.01821 • Published 9 days ago • 18
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction Paper • 2501.01957 • Published 9 days ago • 33
Fewer-token Neural Speech Codec with Time-invariant Codes Paper • 2310.00014 • Published Sep 15, 2023 • 2
Ensembling Large Language Models with Process Reward-Guided Tree Search for Better Complex Reasoning Paper • 2412.15797 • Published 23 days ago • 17
PartGen: Part-level 3D Generation and Reconstruction with Multi-View Diffusion Models Paper • 2412.18608 • Published 19 days ago • 14
ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing Paper • 2412.14711 • Published 24 days ago • 15
In Case You Missed It: ARC 'Challenge' Is Not That Challenging Paper • 2412.17758 • Published 20 days ago • 16
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings Paper • 2501.01257 • Published 10 days ago • 45
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Paper • 2501.00958 • Published 11 days ago • 91
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis Paper • 2412.19723 • Published 16 days ago • 78
DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation Paper • 2412.18597 • Published 19 days ago • 19