wzhouad
/

gemma-2-9b-it-WPO-HB

Text Generation

alignment-handbook

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

wzhouad commited on Aug 21, 2024

Commit

5d97ceb

·

verified ·

1 Parent(s): 2c16574

Update README.md

Files changed (1) hide show

README.md +3 -0

README.md CHANGED Viewed

@@ -3,6 +3,7 @@ base_model: google/gemma-2-9b-it
 library_name: transformers
 datasets:
 - openbmb/UltraFeedback
 tags:
 - alignment-handbook
 - gemma
@@ -18,6 +19,8 @@ gemma-2-9b-it finetuned by hybrid WPO, utilizing two types of data:
 In comparison to the preference data construction method in our paper, we switch to RLHFlow/ArmoRM-Llama3-8B-v0.1 to score the outputs, and choose the outputs with maximum/minimum scores to form a preference pair.
 ### [AlpacaEval Eval Results](https://tatsu-lab.github.io/alpaca_eval/)
 |                Model                           | LC | WR | Avg. Length |
 |-------------------------------------------|:------------:|:--------:|:-----------:|

 library_name: transformers
 datasets:
 - openbmb/UltraFeedback
+- wzhouad/gemma-2-ultrafeedback-hybrid
 tags:
 - alignment-handbook
 - gemma
 In comparison to the preference data construction method in our paper, we switch to RLHFlow/ArmoRM-Llama3-8B-v0.1 to score the outputs, and choose the outputs with maximum/minimum scores to form a preference pair.
+We provide our training data at [wzhouad/gemma-2-ultrafeedback-hybrid](https://huggingface.co/datasets/wzhouad/gemma-2-ultrafeedback-hybrid)
 ### [AlpacaEval Eval Results](https://tatsu-lab.github.io/alpaca_eval/)
 |                Model                           | LC | WR | Avg. Length |
 |-------------------------------------------|:------------:|:--------:|:-----------:|