lxuechen commited on
Commit
912b5d5
1 Parent(s): 9fbb65d

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +32 -0
README.md ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: microsoft-research-license
4
+ license_link: https://huggingface.co/microsoft/phi-2/resolve/main/LICENSE
5
+ language:
6
+ - en
7
+ pipeline_tag: text-generation
8
+ tags:
9
+ - nlp
10
+ - code
11
+ model-index:
12
+ - name: phi-2-dpo
13
+ results:
14
+ - task:
15
+ type: text-generation
16
+ dataset:
17
+ name: AlpacaEval
18
+ type: AlpacaEval
19
+ metrics:
20
+ - name: AlpacaEval
21
+ type: AlpacaEval
22
+ value: 81.37%
23
+ source:
24
+ name: AlpacaEval
25
+ url: https://github.com/tatsu-lab/alpaca_eval
26
+ ---
27
+
28
+ ## Model Summary
29
+
30
+ `phi-2-dpo` is an instruction-tuned model from an earlier SFT model [`phi-2-sft`](https://huggingface.co/lxuechen/phi-2-sft). Direct preference optimization (DPO) is used for fine-tuning on the [UltraFeedback dataset](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized)
31
+
32
+ The purpose of the experiment is to understand the quality of the pre-trained Phi-2 model. The good news is that `phi-2-dpo` can follow open-ended user instructions well.