Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models Paper • 2503.09573 • Published about 12 hours ago • 8 • 1
Gemini Embedding: Generalizable Embeddings from Gemini Paper • 2503.07891 • Published 2 days ago • 21 • 2
AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning Paper • 2503.07608 • Published 2 days ago • 15 • 1
Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models Paper • 2503.06749 • Published 3 days ago • 20 • 2
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning Paper • 2503.05592 • Published 6 days ago • 24 • 2
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning Paper • 2503.05379 • Published 6 days ago • 22 • 3
R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model Paper • 2503.05132 • Published 6 days ago • 42 • 2
TinyR1-32B-Preview: Boosting Accuracy with Branch-Merge Distillation Paper • 2503.04872 • Published 7 days ago • 14 • 2
Learning from Failures in Multi-Attempt Reinforcement Learning Paper • 2503.04808 • Published 9 days ago • 15 • 2
Token-Efficient Long Video Understanding for Multimodal LLMs Paper • 2503.04130 • Published 7 days ago • 76 • 2
The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation Paper • 2503.04606 • Published 7 days ago • 7 • 1
Dedicated Feedback and Edit Models Empower Inference-Time Scaling for Open-Ended General-Domain Tasks Paper • 2503.04378 • Published 7 days ago • 6 • 3
GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control Paper • 2503.03751 • Published 7 days ago • 19 • 4
Sim-to-Real Reinforcement Learning for Vision-Based Dexterous Manipulation on Humanoids Paper • 2502.20396 • Published 13 days ago • 12 • 2
HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models Paper • 2502.20811 • Published 13 days ago • 2 • 2
SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers Paper • 2502.20545 • Published 13 days ago • 20 • 2
Mobius: Text to Seamless Looping Video Generation via Latent Shift Paper • 2502.20307 • Published 14 days ago • 17 • 2
FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute Paper • 2502.20126 • Published 14 days ago • 20 • 2