Exploration with a more stable RL pipeline with outcome-only reward and scaled-up LLMs.
Bowen
PeterJinGo
AI & ML interests
None yet
Recent Activity
updated
a model
about 18 hours ago
rubricrm/qwen2.5_7B_LR1.0e-6_evidence_rubric_4k4k_separate_PPO
published
a model
about 18 hours ago
rubricrm/qwen2.5_7B_LR1.0e-6_evidence_rubric_4k4k_separate_PPO
updated
a model
3 days ago
rubricrm/qwen2.5_7B_LR5.0e-6_evidence_rubric_4k4k_separate_reward_function_largeBz
Organizations
Collections
2
Preliminary checkpoints with outcome-only RL.
-
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 27 -
PeterJinGo/SearchR1-nq_hotpotqa_train-llama3.2-3b-em-ppo
Updated • 1.21k -
PeterJinGo/SearchR1-nq_hotpotqa_train-llama3.2-3b-em-grpo
Updated • 5 -
PeterJinGo/SearchR1-nq_hotpotqa_train-llama3.2-3b-it-em-ppo
Updated • 29
models
31
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-14b-em-ppo-v0.2
Updated
•
1
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-14b-it-em-ppo-v0.2
Updated
•
1
PeterJinGo/R1-nq_hotpotqa_train-qwen2.5-3b-em-ppo-v0.2
Updated
•
4
PeterJinGo/R1-nq_hotpotqa_train-qwen2.5-3b-it-em-ppo-v0.2
Updated
•
23
PeterJinGo/R1-nq_hotpotqa_train-qwen2.5-14b-em-ppo-v0.2
Updated
PeterJinGo/R1-nq_hotpotqa_train-qwen2.5-14b-it-em-ppo-v0.2
Updated
•
3
PeterJinGo/R1-nq_hotpotqa_train-qwen2.5-7b-it-em-ppo-v0.2
Updated
•
3
PeterJinGo/R1-nq_hotpotqa_train-qwen2.5-7b-em-ppo-v0.2
Updated
•
3
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-7b-it-em-grpo-v0.2
Updated
•
14
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-7b-em-grpo-v0.2
Updated
•
4
datasets
13
PeterJinGo/wiki-18-e5-index-HNSW64
Updated
•
29
PeterJinGo/wiki-18-bm25-index
Updated
•
44
PeterJinGo/nq_hotpotqa_train
Viewer
•
Updated
•
221k
•
496
•
1
PeterJinGo/wiki-18-e5-index
Updated
•
1.96k
PeterJinGo/wiki-18-corpus
Updated
•
777
PeterJinGo/ultrafeedback_first_5000
Viewer
•
Updated
•
5k
•
11
PeterJinGo/gsm8k-chat
Viewer
•
Updated
•
7.47k
•
25
PeterJinGo/math-zeroshot-chat
Viewer
•
Updated
•
7.5k
•
29
PeterJinGo/math-zeroshot
Viewer
•
Updated
•
7.5k
•
25
PeterJinGo/math2
Viewer
•
Updated
•
7.5k
•
23