Text Generation
Safetensors
English
llama
conversational

Add library name and link to code

#1
by nielsr HF staff - opened
Files changed (1) hide show
  1. README.md +5 -4
README.md CHANGED
@@ -1,10 +1,11 @@
1
  ---
 
 
2
  language:
3
  - en
4
  license: apache-2.0
5
- datasets:
6
- - openbmb/UltraFeedback
7
  pipeline_tag: text-generation
 
8
  model-index:
9
  - name: GPO-Llama-3-8B-Instruct-GPM-2B
10
  results: []
@@ -16,6 +17,7 @@ General Preference Modeling with Preference Representations for Aligning Languag
16
 
17
  This model was developed using [General Preference Optimization (GPO)](https://arxiv.org/abs/2405.00675) at iteration 3 and the [General Preference representation Model (GPM)](https://arxiv.org/abs/2410.02197) (specifically, using [GPM-Gemma-2B](https://huggingface.co/general-preference/GPM-Gemma-2B)), based on the [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) architecture as starting point. We utilized the prompt sets from the [openbmb/UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset, splited to 3 parts for 3 iterations by [snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset](https://huggingface.co/datasets/snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset). All responses used are synthetic.
18
 
 
19
 
20
  ## Links to Other Models
21
  - [SPPO-Llama-3-8B-Instruct-GPM-2B](https://huggingface.co/general-preference/SPPO-Llama-3-8B-Instruct-GPM-2B)
@@ -73,5 +75,4 @@ The following hyperparameters were used during training:
73
  journal={arXiv preprint arXiv:2410.02197},
74
  year={2024}
75
  }
76
- ```
77
-
 
1
  ---
2
+ datasets:
3
+ - openbmb/UltraFeedback
4
  language:
5
  - en
6
  license: apache-2.0
 
 
7
  pipeline_tag: text-generation
8
+ library_name: transformers
9
  model-index:
10
  - name: GPO-Llama-3-8B-Instruct-GPM-2B
11
  results: []
 
17
 
18
  This model was developed using [General Preference Optimization (GPO)](https://arxiv.org/abs/2405.00675) at iteration 3 and the [General Preference representation Model (GPM)](https://arxiv.org/abs/2410.02197) (specifically, using [GPM-Gemma-2B](https://huggingface.co/general-preference/GPM-Gemma-2B)), based on the [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) architecture as starting point. We utilized the prompt sets from the [openbmb/UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset, splited to 3 parts for 3 iterations by [snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset](https://huggingface.co/datasets/snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset). All responses used are synthetic.
19
 
20
+ https://github.com/general-preference/general-preference-model
21
 
22
  ## Links to Other Models
23
  - [SPPO-Llama-3-8B-Instruct-GPM-2B](https://huggingface.co/general-preference/SPPO-Llama-3-8B-Instruct-GPM-2B)
 
75
  journal={arXiv preprint arXiv:2410.02197},
76
  year={2024}
77
  }
78
+ ```