cj453/dense_reward_trainer_final_opt__NumTrainEpochs2_SaveStrategiesepoch_reward_modeling_anthropic_hh Updated Sep 14 • 6
cj453/dense_reward_trainer_final_opt__NumTrainEpochs2_SaveStrategiesno_reward_modeling_anthropic_hh Updated Sep 15 • 6
cj453/dense_reward_trainer_final_opt__NumTrainEpochs5_SaveStrategiesepoch_reward_modeling_anthropic_hh Updated Sep 16 • 6
cj453/dense_reward_trainer_final_opt__NumTrainEpochs5_SaveStrategiesno_reward_modeling_anthropic_hh Updated Sep 16 • 6