Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling Paper • 2504.13169 • Published 4 days ago • 36
Teaching Large Language Models to Reason with Reinforcement Learning Paper • 2403.04642 • Published Mar 7, 2024 • 51
LM-Parallel/grpo_llama-hs-v3_bs64_rollout5-lr1e-5-seq-weighted-kl0.01-20250319052012 Updated 7 days ago
LM-Parallel/grpo_llama-hs-v3_bs64_rollout5-lr1e-5-seq-weighted-kl0.01-20250319052012 Updated 7 days ago
LM-Parallel/grpo_llama-hsp-v3_bs64_rollout5-lr1e-5-sw-t1.0-kl0.001-sc10-bm10sbm15-20250411103359 Updated 7 days ago • 2
LM-Parallel/grpo_llama-hsp-v3_bs64_rollout5-lr1e-5-sw-t1.0-kl0.001-sc10-bm10sbm15-20250411103359 Updated 7 days ago • 2
LM-Parallel/grpo_llama-hsp-v3_bs64_rollout5-lr1e-5-sw-t1.0-kl0.001-bm10-sbm15-nc-20250411054109 Updated 7 days ago
LM-Parallel/grpo_llama-hsp-v3_bs64_rollout5-lr1e-5-sw-t1.0-kl0.001-bm10-sbm15-nc-20250411054109 Updated 7 days ago
LM-Parallel/grpo_llama-hsp-v3_bs64_rollout5-lr1e-5-sw-t1.0-kl0.01-sc10-bm10sbm15-20250325133311 Updated 7 days ago
LM-Parallel/grpo_llama-hsp-v3_bs64_rollout5-lr1e-5-sw-t1.0-kl0.01-sc10-bm10sbm15-20250325133311 Updated 7 days ago
Token-Efficient Long Video Understanding for Multimodal LLMs Paper • 2503.04130 • Published Mar 6 • 93