selfcorrexp2/llama3_sft_balanced_corr_rr0k_ep3_train_on_reasoning Text Generation • Updated Jan 8 • 22
selfcorrexp2/selfcorrexp2_llama3_openmath_1m_ep1_tmp10_goldrm_labeled Viewer • Updated Jan 23 • 15k • 84
selfcorrexp2/HanningZhang_Llama3-sft-more-corr-rr60k-3ep_moredatatmp10_vllmexp3 Viewer • Updated Jan 23 • 15k • 119
selfcorrexp2/HanningZhang_Llama3-sft-more-corr-rr60k-3ep_moredatatmp10 Viewer • Updated Jan 23 • 15k • 79
selfcorrexp2/HanningZhang_Llama3-sft-more-corr-rr60k-3ep_moredatatmp10_gold_reward Viewer • Updated Jan 23 • 15k • 82
selfcorrexp2/balanced_self_rewarding_rm_labeled_llama3_sft_gen_1round_prompt Viewer • Updated Jan 23 • 15k • 95
selfcorrexp2/llama3_sft_more_corr_rr0k_3ep_more_datatmp10_vllmexp3 Viewer • Updated Jan 23 • 15k • 95