samagra14wefi
commited on
Commit
•
ad7a3fc
1
Parent(s):
61ce6b5
Readme changes 1
Browse files
README.md
CHANGED
@@ -15,11 +15,11 @@ tags:
|
|
15 |
|
16 |
PreferED is a 400M parameter preference evaluation model based on the DeBERTa architecture, designed for evaluating LLM apps.
|
17 |
The model is trained to take in context and text data and output a logit score, which can be used to compare
|
18 |
-
different text generations on evaluative aspects such as hallucinations, quality etc. The
|
19 |
-
can be used to provide evaluation criteria in addition to any relevant retreived context. The
|
20 |
provides the actual text that is being evaluated.
|
21 |
|
22 |
-
- **Model name**:
|
23 |
- **Model type**: DeBERTa
|
24 |
- **Training data**: This model was trained on [Anthropic HH/RLHF](https://huggingface.co/datasets/Anthropic/hh-rlhf) using a [Deberta-v3-large](https://huggingface.co/microsoft/deberta-v3-large) base model
|
25 |
- **Evaluation data**: Achieves 69.7% accuracy on the Anthropic hh-rlhf test split.
|
@@ -41,7 +41,7 @@ model = model.to(device)
|
|
41 |
|
42 |
### Measuring hallucinations
|
43 |
|
44 |
-
Use the
|
45 |
|
46 |
```python
|
47 |
def calc_score(context, gen_text):
|
@@ -119,7 +119,7 @@ Here's an example of how your data might look:
|
|
119 |
context,text,label
|
120 |
"Evaluate the accuracy of the statement based on historical facts.","The sun revolves around the Earth.",0
|
121 |
"Evaluate the accuracy of the statement based on historical facts.","The Earth revolves around the sun.",1
|
122 |
-
|
123 |
|
124 |
You can then load this data into a `Dataset` object using a library such as Hugging Face's `datasets`.
|
125 |
|
|
|
15 |
|
16 |
PreferED is a 400M parameter preference evaluation model based on the DeBERTa architecture, designed for evaluating LLM apps.
|
17 |
The model is trained to take in context and text data and output a logit score, which can be used to compare
|
18 |
+
different text generations on evaluative aspects such as hallucinations, quality etc. The `context` variable
|
19 |
+
can be used to provide evaluation criteria in addition to any relevant retreived context. The `gen_text` variable
|
20 |
provides the actual text that is being evaluated.
|
21 |
|
22 |
+
- **Model name**: PreferED
|
23 |
- **Model type**: DeBERTa
|
24 |
- **Training data**: This model was trained on [Anthropic HH/RLHF](https://huggingface.co/datasets/Anthropic/hh-rlhf) using a [Deberta-v3-large](https://huggingface.co/microsoft/deberta-v3-large) base model
|
25 |
- **Evaluation data**: Achieves 69.7% accuracy on the Anthropic hh-rlhf test split.
|
|
|
41 |
|
42 |
### Measuring hallucinations
|
43 |
|
44 |
+
Use the `context` variable to give the retreived context.
|
45 |
|
46 |
```python
|
47 |
def calc_score(context, gen_text):
|
|
|
119 |
context,text,label
|
120 |
"Evaluate the accuracy of the statement based on historical facts.","The sun revolves around the Earth.",0
|
121 |
"Evaluate the accuracy of the statement based on historical facts.","The Earth revolves around the sun.",1
|
122 |
+
```
|
123 |
|
124 |
You can then load this data into a `Dataset` object using a library such as Hugging Face's `datasets`.
|
125 |
|