view article Article Illustrating Reinforcement Learning from Human Feedback (RLHF) Dec 9, 2022 • 219
Implicit Reasoning in Transformers is Reasoning through Shortcuts Paper • 2503.07604 • Published 25 days ago • 21
LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization Paper • 2503.08619 • Published 24 days ago • 20
Gemini Embedding: Generalizable Embeddings from Gemini Paper • 2503.07891 • Published 25 days ago • 34
UniF^2ace: Fine-grained Face Understanding and Generation with Unified Multimodal Models Paper • 2503.08120 • Published 25 days ago • 30
MagicInfinite: Generating Infinite Talking Videos with Your Words and Voice Paper • 2503.05978 • Published 28 days ago • 34
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL Paper • 2503.07536 • Published 25 days ago • 83
Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia Paper • 2503.07920 • Published 25 days ago • 95
Self-Taught Self-Correction for Small Language Models Paper • 2503.08681 • Published 24 days ago • 13
GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training Paper • 2503.08525 • Published 24 days ago • 15
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning Paper • 2503.09516 • Published 23 days ago • 27