Add library name and link to code
#1
by
nielsr
HF staff
- opened
README.md
CHANGED
@@ -1,10 +1,11 @@
|
|
1 |
---
|
|
|
|
|
2 |
language:
|
3 |
- en
|
4 |
license: apache-2.0
|
5 |
-
datasets:
|
6 |
-
- openbmb/UltraFeedback
|
7 |
pipeline_tag: text-generation
|
|
|
8 |
model-index:
|
9 |
- name: GPO-Llama-3-8B-Instruct-GPM-2B
|
10 |
results: []
|
@@ -16,6 +17,7 @@ General Preference Modeling with Preference Representations for Aligning Languag
|
|
16 |
|
17 |
This model was developed using [General Preference Optimization (GPO)](https://arxiv.org/abs/2405.00675) at iteration 3 and the [General Preference representation Model (GPM)](https://arxiv.org/abs/2410.02197) (specifically, using [GPM-Gemma-2B](https://huggingface.co/general-preference/GPM-Gemma-2B)), based on the [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) architecture as starting point. We utilized the prompt sets from the [openbmb/UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset, splited to 3 parts for 3 iterations by [snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset](https://huggingface.co/datasets/snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset). All responses used are synthetic.
|
18 |
|
|
|
19 |
|
20 |
## Links to Other Models
|
21 |
- [SPPO-Llama-3-8B-Instruct-GPM-2B](https://huggingface.co/general-preference/SPPO-Llama-3-8B-Instruct-GPM-2B)
|
@@ -73,5 +75,4 @@ The following hyperparameters were used during training:
|
|
73 |
journal={arXiv preprint arXiv:2410.02197},
|
74 |
year={2024}
|
75 |
}
|
76 |
-
```
|
77 |
-
|
|
|
1 |
---
|
2 |
+
datasets:
|
3 |
+
- openbmb/UltraFeedback
|
4 |
language:
|
5 |
- en
|
6 |
license: apache-2.0
|
|
|
|
|
7 |
pipeline_tag: text-generation
|
8 |
+
library_name: transformers
|
9 |
model-index:
|
10 |
- name: GPO-Llama-3-8B-Instruct-GPM-2B
|
11 |
results: []
|
|
|
17 |
|
18 |
This model was developed using [General Preference Optimization (GPO)](https://arxiv.org/abs/2405.00675) at iteration 3 and the [General Preference representation Model (GPM)](https://arxiv.org/abs/2410.02197) (specifically, using [GPM-Gemma-2B](https://huggingface.co/general-preference/GPM-Gemma-2B)), based on the [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) architecture as starting point. We utilized the prompt sets from the [openbmb/UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset, splited to 3 parts for 3 iterations by [snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset](https://huggingface.co/datasets/snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset). All responses used are synthetic.
|
19 |
|
20 |
+
https://github.com/general-preference/general-preference-model
|
21 |
|
22 |
## Links to Other Models
|
23 |
- [SPPO-Llama-3-8B-Instruct-GPM-2B](https://huggingface.co/general-preference/SPPO-Llama-3-8B-Instruct-GPM-2B)
|
|
|
75 |
journal={arXiv preprint arXiv:2410.02197},
|
76 |
year={2024}
|
77 |
}
|
78 |
+
```
|
|