zhezi12138 commited on
Commit
cb9e159
·
verified ·
1 Parent(s): 811a4ff

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -3
README.md CHANGED
@@ -1,3 +1,10 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - RLHFlow/iterative-prompt-v1-iter1-20K
5
+ language:
6
+ - en
7
+ base_model:
8
+ - openlm-research/open_llama_3b_v2
9
+ ---
10
+ This model is for the reproduction of results on Iterative-Prompt dataset of paper "The crucial role of samplers in online direct preference optimization". You can download it and save it as "models/rlhflow_iter1", and then start the training pipeline. Since we've retrained the models, the results may slightly differ from that reported in the paper, and we will update it later.