lxuechen
/

phi-2-dpo

Text Generation

Inference Endpoints

Model card Files Files and versions Community

lxuechen commited on Dec 27, 2023

Commit

912b5d5

•

1 Parent(s): 9fbb65d

Create README.md

Files changed (1) hide show

README.md +32 -0

README.md ADDED Viewed

	@@ -0,0 +1,32 @@

+---
+license: other
+license_name: microsoft-research-license
+license_link: https://huggingface.co/microsoft/phi-2/resolve/main/LICENSE
+language:
+- en
+pipeline_tag: text-generation
+tags:
+- nlp
+- code
+model-index:
+ - name: phi-2-dpo
+ results:
+ - task:
+ type: text-generation
+ dataset:
+ name: AlpacaEval
+ type: AlpacaEval
+ metrics:
+ - name: AlpacaEval
+ type: AlpacaEval
+ value: 81.37%
+ source:
+ name: AlpacaEval
+ url: https://github.com/tatsu-lab/alpaca_eval
+---
+## Model Summary
+`phi-2-dpo` is an instruction-tuned model from an earlier SFT model [`phi-2-sft`](https://huggingface.co/lxuechen/phi-2-sft). Direct preference optimization (DPO) is used for fine-tuning on the [UltraFeedback dataset](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized)
+The purpose of the experiment is to understand the quality of the pre-trained Phi-2 model. The good news is that `phi-2-dpo` can follow open-ended user instructions well.