zhezi12138
/

llama-3b-iter-1

Model card Files Files and versions Community

zhezi12138 commited on 22 days ago

Commit

cb9e159

·

verified ·

1 Parent(s): 811a4ff

Update README.md

Files changed (1) hide show

README.md +10 -3

README.md CHANGED Viewed

@@ -1,3 +1,10 @@
----
-license: mit
----

+---
+license: mit
+datasets:
+- RLHFlow/iterative-prompt-v1-iter1-20K
+language:
+- en
+base_model:
+- openlm-research/open_llama_3b_v2
+---
+This model is for the reproduction of results on Iterative-Prompt dataset of paper "The crucial role of samplers in online direct preference optimization". You can download it and save it as "models/rlhflow_iter1", and then start the training pipeline. Since we've retrained the models, the results may slightly differ from that reported in the paper, and we will update it later.