samagra14wefi commited on
Commit
ae5ad0d
1 Parent(s): 4ebf15e

Add details

Browse files
Files changed (1) hide show
  1. README.md +151 -1
README.md CHANGED
@@ -9,4 +9,154 @@ tags:
9
  - evaluations
10
  ---
11
 
12
- Prefe
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  - evaluations
10
  ---
11
 
12
+ # PreferED: Preference Evaluation DeBERTa Model
13
+
14
+ ## Model Description
15
+
16
+ PreferED is a 400M parameter preference evaluation model based on the DeBERTa architecture, designed for evaluating LLM apps.
17
+ The model is trained to take in context and text data and output a logit score, which can be used to compare
18
+ different text generations on evaluative aspects such as hallucinations, quality etc. The _context_ variable
19
+ can be used to provide evaluation criteria in addition to any relevant retreived context. The _gen_text_ variable
20
+ provides the actual text that is being evaluated.
21
+
22
+ - **Model name**: samagra14wefi/PreferED
23
+ - **Model type**: DeBERTa
24
+ - **Training data**: This model was trained on [Anthropic HH/RLHF] (https://huggingface.co/datasets/Anthropic/hh-rlhf) using a [Deberta-v3-large] (https://huggingface.co/microsoft/deberta-v3-large) base model
25
+ - **Evaluation data**: Achieves 69.7% accuracy on the Anthropic hh-rlhf split.
26
+
27
+ ## Usage
28
+
29
+ ### Loading the Model
30
+
31
+ ```python
32
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
33
+ import torch
34
+
35
+ tokenizer = AutoTokenizer.from_pretrained("samagra14wefi/PreferED")
36
+ model = AutoModelForSequenceClassification.from_pretrained("samagra14wefi/PreferED")
37
+
38
+ device = "cuda:0" if torch.cuda.is_available() else "cpu"
39
+ model = model.to(device)
40
+ ```
41
+
42
+ ### Measuring hallucinations
43
+
44
+ Use the _context_ variable to give the retreived context.
45
+
46
+ ```python
47
+ def calc_score(context, gen_text):
48
+ with torch.no_grad():
49
+ inputs = tokenizer(context, gen_text, return_tensors='pt')
50
+ logits = model(**inputs).logits
51
+ score = logits[0].cpu().detach()
52
+ return score
53
+
54
+ context_string = '''India won the world cup in 1983 and 2011. Australia won the world cup five times.
55
+ West Indies have won the world cup twice. Sri Lanka, Pakistan and England have won the world cup once.
56
+ Evaluate if the facts below are consistent with the statement.'''
57
+
58
+ response_string_one = '''India has won the world cup most number of times.'''
59
+ response_string_two = '''Australia has won the world cup most number of times.'''
60
+
61
+ score_one = calc_score(context_string, response_string_one)
62
+ score_two = calc_score(context_string, response_string_two)
63
+
64
+ assert score_two > score_one
65
+ ```
66
+
67
+ ### Evaluating Response relevance
68
+
69
+ ```python
70
+ inquiry = "What is your return policy?"
71
+ response_good = "Our return policy lasts 30 days. If 30 days have gone by since your purchase,
72
+ unfortunately, we can’t offer you a refund or exchange."
73
+ response_bad = "We offer a variety of fresh produce including apples, oranges, and bananas."
74
+
75
+ score_good = calc_score(inquiry, response_good)
76
+ score_bad = calc_score(inquiry, response_bad)
77
+
78
+ assert score_good > score_bad
79
+
80
+ ```
81
+
82
+ ### Evaluating Content Appropriateness
83
+
84
+ ```python
85
+ context = "Discussing the political scenario in Country X."
86
+ response_clean = "The political scenario in Country X is quite dynamic with multiple parties vying for power."
87
+ response_offensive = "The politicians in Country X are all corrupt and stupid."
88
+
89
+ score_clean = calc_score(context, response_clean)
90
+ score_offensive = calc_score(context, response_offensive)
91
+
92
+ assert score_clean > score_offensive
93
+ ```
94
+
95
+ ### Comparing Different Language Models
96
+
97
+ ```python
98
+ context = "Explain the process of photosynthesis."
99
+ response_gpt3 = "Photosynthesis is the process by which green plants and some other organisms use sunlight to synthesize foods with the help of chlorophyll pigments."
100
+ response_bert = "Photosynthesis is a method that converts carbon dioxide into organic compounds, especially sugars, in the presence of sunlight."
101
+
102
+ score_gpt3 = calc_score(context, response_gpt3)
103
+ score_bert = calc_score(context, response_bert)
104
+
105
+ assert score_gpt3 > score_bert
106
+ ```
107
+
108
+ ## Finetuning on your production data
109
+
110
+ The PreferED model is relatively lightweight compared to some other large language models, making it a good candidate for fine-tuning on specific tasks or datasets. Fine-tuning the model on your own production data can lead to better performance as it helps the model to better understand the nuances and context specific to your application.
111
+
112
+ ### Preparing the Training Dataset
113
+
114
+ For fine-tuning the PreferED model on production evaluation tasks, it's crucial to structure your data correctly. The dataset should be formatted such that each example contains a shared context that provides the evaluation criteria, a text input, and a binary label indicating the preference or correctness of the text input in relation to the evaluation criteria.
115
+
116
+ Here's an example of how your data might look:
117
+
118
+ ```plaintext
119
+ context,text,label
120
+ "Evaluate the accuracy of the statement based on historical facts.","The sun revolves around the Earth.",0
121
+ "Evaluate the accuracy of the statement based on historical facts.","The Earth revolves around the sun.",1
122
+ ...
123
+
124
+ You can then load this data into a `Dataset` object using a library such as Hugging Face's `datasets`.
125
+
126
+ ### Finetuning Example
127
+
128
+ ```python
129
+
130
+ from transformers import DebertaTokenizer, DebertaForSequenceClassification, Trainer, TrainingArguments
131
+ import torch
132
+
133
+ tokenizer = DebertaTokenizer.from_pretrained("samagra14wefi/PreferED")
134
+ model = DebertaForSequenceClassification.from_pretrained("samagra14wefi/PreferED")
135
+
136
+ # Define the training arguments
137
+ training_args = TrainingArguments(
138
+ per_device_train_batch_size=8,
139
+ num_train_epochs=3,
140
+ logging_dir='./logs',
141
+ )
142
+
143
+ # Create the Trainer
144
+ trainer = Trainer(
145
+ model=model,
146
+ args=training_args,
147
+ train_dataset=train_dataset, # provide your training dataset
148
+ eval_dataset=eval_dataset, # provide your evaluation dataset
149
+ )
150
+
151
+ # Train the model
152
+ trainer.train()
153
+
154
+ ```
155
+
156
+ ### Loss Function Consideration
157
+
158
+ Anthropic recommends using the loss function \( LPM = \log(1 + e^{\text{{rbad}} - \text{{rgood}}}) \) for preference models. However, this PreferED model was trained using binary cross-entropy loss, and therefore changing the loss functions might increase the training time to converge. For more details on preference models and loss functions, you may refer to the paper by Askell et al., 2021: [A General Language Assistant as a Laboratory for Alignment](https://arxiv.org/abs/2112.00861).
159
+
160
+
161
+
162
+