samagra14wefi commited on
Commit
ad7a3fc
1 Parent(s): 61ce6b5

Readme changes 1

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -15,11 +15,11 @@ tags:
15
 
16
  PreferED is a 400M parameter preference evaluation model based on the DeBERTa architecture, designed for evaluating LLM apps.
17
  The model is trained to take in context and text data and output a logit score, which can be used to compare
18
- different text generations on evaluative aspects such as hallucinations, quality etc. The _context_ variable
19
- can be used to provide evaluation criteria in addition to any relevant retreived context. The _gen_text_ variable
20
  provides the actual text that is being evaluated.
21
 
22
- - **Model name**: samagra14wefi/PreferED
23
  - **Model type**: DeBERTa
24
  - **Training data**: This model was trained on [Anthropic HH/RLHF](https://huggingface.co/datasets/Anthropic/hh-rlhf) using a [Deberta-v3-large](https://huggingface.co/microsoft/deberta-v3-large) base model
25
  - **Evaluation data**: Achieves 69.7% accuracy on the Anthropic hh-rlhf test split.
@@ -41,7 +41,7 @@ model = model.to(device)
41
 
42
  ### Measuring hallucinations
43
 
44
- Use the _context_ variable to give the retreived context.
45
 
46
  ```python
47
  def calc_score(context, gen_text):
@@ -119,7 +119,7 @@ Here's an example of how your data might look:
119
  context,text,label
120
  "Evaluate the accuracy of the statement based on historical facts.","The sun revolves around the Earth.",0
121
  "Evaluate the accuracy of the statement based on historical facts.","The Earth revolves around the sun.",1
122
- ...
123
 
124
  You can then load this data into a `Dataset` object using a library such as Hugging Face's `datasets`.
125
 
 
15
 
16
  PreferED is a 400M parameter preference evaluation model based on the DeBERTa architecture, designed for evaluating LLM apps.
17
  The model is trained to take in context and text data and output a logit score, which can be used to compare
18
+ different text generations on evaluative aspects such as hallucinations, quality etc. The `context` variable
19
+ can be used to provide evaluation criteria in addition to any relevant retreived context. The `gen_text` variable
20
  provides the actual text that is being evaluated.
21
 
22
+ - **Model name**: PreferED
23
  - **Model type**: DeBERTa
24
  - **Training data**: This model was trained on [Anthropic HH/RLHF](https://huggingface.co/datasets/Anthropic/hh-rlhf) using a [Deberta-v3-large](https://huggingface.co/microsoft/deberta-v3-large) base model
25
  - **Evaluation data**: Achieves 69.7% accuracy on the Anthropic hh-rlhf test split.
 
41
 
42
  ### Measuring hallucinations
43
 
44
+ Use the `context` variable to give the retreived context.
45
 
46
  ```python
47
  def calc_score(context, gen_text):
 
119
  context,text,label
120
  "Evaluate the accuracy of the statement based on historical facts.","The sun revolves around the Earth.",0
121
  "Evaluate the accuracy of the statement based on historical facts.","The Earth revolves around the sun.",1
122
+ ```
123
 
124
  You can then load this data into a `Dataset` object using a library such as Hugging Face's `datasets`.
125