GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published 29 days ago • 221
Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models Paper • 2601.22060 • Published 8 days ago • 145
TranslateGemma VLLM Collection Modified version of google/translategemma-4/12/27b-it optimized for deployment with vLLM. • 3 items • Updated 3 days ago • 1
TranslateGemma VLLM Collection Modified version of google/translategemma-4/12/27b-it optimized for deployment with vLLM. • 3 items • Updated 3 days ago • 1
TranslateGemma VLLM Collection Modified version of google/translategemma-4/12/27b-it optimized for deployment with vLLM. • 3 items • Updated 3 days ago • 1
Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning Paper • 2601.09088 • Published 24 days ago • 62