general-preference
/

GPO-Llama-3-8B-Instruct-GPM-2B

Text Generation

Model card Files Files and versions Community

Add library name and link to code

#1

by nielsr HF staff - opened 15 days ago

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

Files changed (1) hide show

README.md +5 -4

README.md CHANGED Viewed

@@ -1,10 +1,11 @@
 ---
 language:
 - en
 license: apache-2.0
-datasets:
-- openbmb/UltraFeedback
 pipeline_tag: text-generation
 model-index:
 - name: GPO-Llama-3-8B-Instruct-GPM-2B
   results: []
@@ -16,6 +17,7 @@ General Preference Modeling with Preference Representations for Aligning Languag
 This model was developed using [General Preference Optimization (GPO)](https://arxiv.org/abs/2405.00675) at iteration 3 and the [General Preference representation Model (GPM)](https://arxiv.org/abs/2410.02197) (specifically, using [GPM-Gemma-2B](https://huggingface.co/general-preference/GPM-Gemma-2B)), based on the [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) architecture as starting point. We utilized the prompt sets from the [openbmb/UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset, splited to 3 parts for 3 iterations by [snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset](https://huggingface.co/datasets/snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset). All responses used are synthetic.
 ## Links to Other Models
 - [SPPO-Llama-3-8B-Instruct-GPM-2B](https://huggingface.co/general-preference/SPPO-Llama-3-8B-Instruct-GPM-2B)
@@ -73,5 +75,4 @@ The following hyperparameters were used during training:
   journal={arXiv preprint arXiv:2410.02197},
   year={2024}
 }
-```

 ---
+datasets:
+- openbmb/UltraFeedback
 language:
 - en
 license: apache-2.0
 pipeline_tag: text-generation
+library_name: transformers
 model-index:
 - name: GPO-Llama-3-8B-Instruct-GPM-2B
   results: []
 This model was developed using [General Preference Optimization (GPO)](https://arxiv.org/abs/2405.00675) at iteration 3 and the [General Preference representation Model (GPM)](https://arxiv.org/abs/2410.02197) (specifically, using [GPM-Gemma-2B](https://huggingface.co/general-preference/GPM-Gemma-2B)), based on the [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) architecture as starting point. We utilized the prompt sets from the [openbmb/UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset, splited to 3 parts for 3 iterations by [snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset](https://huggingface.co/datasets/snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset). All responses used are synthetic.
+https://github.com/general-preference/general-preference-model
 ## Links to Other Models
 - [SPPO-Llama-3-8B-Instruct-GPM-2B](https://huggingface.co/general-preference/SPPO-Llama-3-8B-Instruct-GPM-2B)
   journal={arXiv preprint arXiv:2410.02197},
   year={2024}
 }
+```