This collection contains datasets and models related to "BLEUBERI: BLEU is a surprisingly effective reward for instruction following".
-
BLEUBERI: BLEU is a surprisingly effective reward for instruction following
Paper • 2505.11080 • Published • 5 -
yapeichang/BLEUBERI-Tulu3-50k
Viewer • Updated • 50k • 653 • 1 -
yapeichang/Qwen2.5-7B-BLEUBERI
Text Generation • Updated • 844 • • 1 -
yapeichang/Qwen2.5-7B-RM8B
Text Generation • Updated • 15