samagra14wefi
/

PreferED

Text Classification

Model card Files Files and versions Community

samagra14wefi commited on Oct 3, 2023

Commit

ad7a3fc

•

1 Parent(s): 61ce6b5

Readme changes 1

Files changed (1) hide show

README.md +5 -5

README.md CHANGED Viewed

@@ -15,11 +15,11 @@ tags:
 PreferED is a 400M parameter preference evaluation model based on the DeBERTa architecture, designed for evaluating LLM apps.
 The model is trained to take in context and text data and output a logit score, which can be used to compare
-different text generations on evaluative aspects such as hallucinations, quality etc. The _context_ variable
-can be used to provide evaluation criteria in addition to any relevant retreived context. The _gen_text_ variable
 provides the actual text that is being evaluated.
-- **Model name**: samagra14wefi/PreferED
 - **Model type**: DeBERTa
 - **Training data**: This model was trained on [Anthropic HH/RLHF](https://huggingface.co/datasets/Anthropic/hh-rlhf) using a [Deberta-v3-large](https://huggingface.co/microsoft/deberta-v3-large) base model
 - **Evaluation data**: Achieves 69.7% accuracy on the Anthropic hh-rlhf test split.
@@ -41,7 +41,7 @@ model = model.to(device)
 ### Measuring hallucinations
-Use the _context_ variable to give the retreived context.
 ```python
 def calc_score(context, gen_text):
@@ -119,7 +119,7 @@ Here's an example of how your data might look:
 context,text,label
 "Evaluate the accuracy of the statement based on historical facts.","The sun revolves around the Earth.",0
 "Evaluate the accuracy of the statement based on historical facts.","The Earth revolves around the sun.",1
-...
 You can then load this data into a `Dataset` object using a library such as Hugging Face's `datasets`.

 PreferED is a 400M parameter preference evaluation model based on the DeBERTa architecture, designed for evaluating LLM apps.
 The model is trained to take in context and text data and output a logit score, which can be used to compare
+different text generations on evaluative aspects such as hallucinations, quality etc. The `context` variable
+can be used to provide evaluation criteria in addition to any relevant retreived context. The `gen_text` variable
 provides the actual text that is being evaluated.
+- **Model name**: PreferED
 - **Model type**: DeBERTa
 - **Training data**: This model was trained on [Anthropic HH/RLHF](https://huggingface.co/datasets/Anthropic/hh-rlhf) using a [Deberta-v3-large](https://huggingface.co/microsoft/deberta-v3-large) base model
 - **Evaluation data**: Achieves 69.7% accuracy on the Anthropic hh-rlhf test split.
 ### Measuring hallucinations
+Use the `context` variable to give the retreived context.
 ```python
 def calc_score(context, gen_text):
 context,text,label
 "Evaluate the accuracy of the statement based on historical facts.","The sun revolves around the Earth.",0
 "Evaluate the accuracy of the statement based on historical facts.","The Earth revolves around the sun.",1
+```
 You can then load this data into a `Dataset` object using a library such as Hugging Face's `datasets`.