selfcorrexp2/type12_math_augmath_dpo_sftlossbeta05_step400 Text Generation • Updated 16 days ago • 26
selfcorrexp2/llama3_sft_less_corr_rr0k_ep3_train_on_reasoning Text Generation • Updated 25 days ago • 59
selfcorrexp2/llama3_sft_balanced_corr_rr0k_ep3_train_on_reasoning Text Generation • Updated 26 days ago • 35
selfcorrexp2/selfcorrexp2_llama3_openmath_1m_ep1_tmp10_goldrm_labeled Viewer • Updated 11 days ago • 15k • 65
selfcorrexp2/HanningZhang_Llama3-sft-more-corr-rr60k-3ep_moredatatmp10_vllmexp3 Viewer • Updated 11 days ago • 15k • 59
selfcorrexp2/HanningZhang_Llama3-sft-more-corr-rr60k-3ep_moredatatmp10 Viewer • Updated 11 days ago • 15k • 57
selfcorrexp2/HanningZhang_Llama3-sft-more-corr-rr60k-3ep_moredatatmp10_gold_reward Viewer • Updated 11 days ago • 15k • 24
selfcorrexp2/balanced_self_rewarding_rm_labeled_llama3_sft_gen_1round_prompt Viewer • Updated 11 days ago • 15k • 26
selfcorrexp2/llama3_sft_more_corr_rr0k_3ep_more_datatmp10_vllmexp3 Viewer • Updated 11 days ago • 15k • 29