Update README.md
Browse files
README.md
CHANGED
@@ -3,6 +3,7 @@ base_model: google/gemma-2-9b-it
|
|
3 |
library_name: transformers
|
4 |
datasets:
|
5 |
- openbmb/UltraFeedback
|
|
|
6 |
tags:
|
7 |
- alignment-handbook
|
8 |
- gemma
|
@@ -18,6 +19,8 @@ gemma-2-9b-it finetuned by hybrid WPO, utilizing two types of data:
|
|
18 |
|
19 |
In comparison to the preference data construction method in our paper, we switch to RLHFlow/ArmoRM-Llama3-8B-v0.1 to score the outputs, and choose the outputs with maximum/minimum scores to form a preference pair.
|
20 |
|
|
|
|
|
21 |
### [AlpacaEval Eval Results](https://tatsu-lab.github.io/alpaca_eval/)
|
22 |
| Model | LC | WR | Avg. Length |
|
23 |
|-------------------------------------------|:------------:|:--------:|:-----------:|
|
|
|
3 |
library_name: transformers
|
4 |
datasets:
|
5 |
- openbmb/UltraFeedback
|
6 |
+
- wzhouad/gemma-2-ultrafeedback-hybrid
|
7 |
tags:
|
8 |
- alignment-handbook
|
9 |
- gemma
|
|
|
19 |
|
20 |
In comparison to the preference data construction method in our paper, we switch to RLHFlow/ArmoRM-Llama3-8B-v0.1 to score the outputs, and choose the outputs with maximum/minimum scores to form a preference pair.
|
21 |
|
22 |
+
We provide our training data at [wzhouad/gemma-2-ultrafeedback-hybrid](https://huggingface.co/datasets/wzhouad/gemma-2-ultrafeedback-hybrid)
|
23 |
+
|
24 |
### [AlpacaEval Eval Results](https://tatsu-lab.github.io/alpaca_eval/)
|
25 |
| Model | LC | WR | Avg. Length |
|
26 |
|-------------------------------------------|:------------:|:--------:|:-----------:|
|