combines reinforcement learning (RL) and large language models (LLMs) to improve exploration using diverse tool generation during inference
Gabriel Bo
gabrielbo
·
AI & ML interests
NLP, Scaling, Test-time Compute
Organizations
datasets 9
gabrielbo/swirl-trajectories-mmlu-pro
Viewer • Updated • 24.8k • 12 • 2
gabrielbo/explore-rl-hotpota-trajectories
Updated • 6
gabrielbo/gpqa-llama-3-8b-verifier
Viewer • Updated • 910 • 395
gabrielbo/mmlu-college-llama-3-8b-verifiers
Viewer • Updated • 870 • 10
gabrielbo/mmlu-pro-specific-choice-scored
Viewer • Updated • 870 • 5
gabrielbo/mmlu-pro-baseline-scored
Viewer • Updated • 87 • 7
gabrielbo/mmlu-pro-verifiers-specific-choice
Viewer • Updated • 870 • 8
gabrielbo/mmlu-pro-verifiers-baseline
Viewer • Updated • 87 • 8
gabrielbo/mmlu-pro-justifications-llama-3
Viewer • Updated • 87 • 7