JW17 commited on
Commit
c3693da
·
verified ·
1 Parent(s): e74a696

Add paper link

Browse files
Files changed (1) hide show
  1. README.md +14 -1
README.md CHANGED
@@ -160,7 +160,7 @@ model-index:
160
  ---
161
  # **Mistral-ORPO-β (7B)**
162
 
163
- **Mistral-ORPO** is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) using the *odds ratio preference optimization (ORPO)*. With ORPO, the model directly learns the preference without the supervised fine-tuning warmup phase. **Mistral-ORPO-β** is fine-tuned exclusively on the 61k instances of the cleaned version of UltraFeedback, [argilla/ultrafeedback-binarized-preferences-cleaned](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned), by [Argilla](https://huggingface.co/argilla).
164
 
165
  - **Github Repository**: https://github.com/xfactlab/orpo
166
 
@@ -214,4 +214,17 @@ response = tokenizer.batch_decode(output)
214
  #Hi! How are you doing?</s>
215
  #<|assistant|>
216
  #I'm doing well, thank you! How are you?</s>
 
 
 
 
 
 
 
 
 
 
 
 
 
217
  ```
 
160
  ---
161
  # **Mistral-ORPO-β (7B)**
162
 
163
+ **Mistral-ORPO** is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) using the *[odds ratio preference optimization (ORPO)](https://arxiv.org/abs/2403.07691)*. With ORPO, the model directly learns the preference without the supervised fine-tuning warmup phase. **Mistral-ORPO-β** is fine-tuned exclusively on the 61k instances of the cleaned version of UltraFeedback, [argilla/ultrafeedback-binarized-preferences-cleaned](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned), by [Argilla](https://huggingface.co/argilla).
164
 
165
  - **Github Repository**: https://github.com/xfactlab/orpo
166
 
 
214
  #Hi! How are you doing?</s>
215
  #<|assistant|>
216
  #I'm doing well, thank you! How are you?</s>
217
+ ```
218
+
219
+ ## 📎 **Citation**
220
+
221
+ ```
222
+ @misc{hong2024orpo,
223
+ title={ORPO: Monolithic Preference Optimization without Reference Model},
224
+ author={Jiwoo Hong and Noah Lee and James Thorne},
225
+ year={2024},
226
+ eprint={2403.07691},
227
+ archivePrefix={arXiv},
228
+ primaryClass={cs.CL}
229
+ }
230
  ```