Adding Evaluation Results

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show

README.md +32 -25

README.md CHANGED Viewed

@@ -1,38 +1,34 @@
 ---
 license: apache-2.0
 tags:
 - generated_from_trainer
 - chatgpt
 - HC3
 metrics:
 - accuracy
-model-index:
-- name: distilgpt2-HC3
-  results: []
 widget:
-- text: >-
-    Review: Best cast iron skillet you will ever buy. Is this review positive or
-    negative? <answer>
   example_title: Sentiment analysis
-- text: >-
-    Barack Obama nominated Hilary Clinton as his secretary of state on Monday.
     He chose her because <answer>
   example_title: Coreference resolution
-- text: >-
-    On a shelf, there are five books: a gray book, a red book, a purple book, a
-    blue book, and a black book. Here's the puzzle, <answer>
   example_title: Logic puzzles
-- text: >-
-    The two men running to become New York City's next mayor will face off in
     their first debate Wednesday night <answer>
   example_title: Reading comprehension
-- text: >-
-    Is it true that if I have five 5-hour energy drinks in a single 24-hour
-    period, I get 25 hours of energy and spontaneously explode? <answer>
   example_title: 5 hour energy
-- text: >-
-    what happens if you train a smaller model on a dataset of
-    reinforcement-learning optimized model responses? <answer>
   example_title: deep learning advice
 inference:
   parameters:
@@ -42,12 +38,10 @@ inference:
     repetition_penalty: 1.5
     eta_cutoff: 0.0008
     renormalize_logits: true
-datasets:
-- pszemraj/HC3-textgen-qa
-language:
-- en
-library_name: transformers
 pipeline_tag: text-generation
 ---
@@ -117,4 +111,17 @@ The following hyperparameters were used during training:
 - Transformers 4.27.0.dev0
 - Pytorch 1.11.0+cu113
 - Datasets 2.6.1
-- Tokenizers 0.12.1

 ---
+language:
+- en
 license: apache-2.0
+library_name: transformers
 tags:
 - generated_from_trainer
 - chatgpt
 - HC3
+datasets:
+- pszemraj/HC3-textgen-qa
 metrics:
 - accuracy
 widget:
+- text: 'Review: Best cast iron skillet you will ever buy. Is this review positive
+    or negative? <answer>'
   example_title: Sentiment analysis
+- text: Barack Obama nominated Hilary Clinton as his secretary of state on Monday.
     He chose her because <answer>
   example_title: Coreference resolution
+- text: 'On a shelf, there are five books: a gray book, a red book, a purple book,
+    a blue book, and a black book. Here''s the puzzle, <answer>'
   example_title: Logic puzzles
+- text: The two men running to become New York City's next mayor will face off in
     their first debate Wednesday night <answer>
   example_title: Reading comprehension
+- text: Is it true that if I have five 5-hour energy drinks in a single 24-hour period,
+    I get 25 hours of energy and spontaneously explode? <answer>
   example_title: 5 hour energy
+- text: what happens if you train a smaller model on a dataset of reinforcement-learning
+    optimized model responses? <answer>
   example_title: deep learning advice
 inference:
   parameters:
     repetition_penalty: 1.5
     eta_cutoff: 0.0008
     renormalize_logits: true
 pipeline_tag: text-generation
+model-index:
+- name: distilgpt2-HC3
+  results: []
 ---
 - Transformers 4.27.0.dev0
 - Pytorch 1.11.0+cu113
 - Datasets 2.6.1
+- Tokenizers 0.12.1
+# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
+Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_pszemraj__distilgpt2-HC3)
+|             Metric              |Value|
+|---------------------------------|----:|
+|Avg.                             |28.18|
+|AI2 Reasoning Challenge (25-Shot)|24.66|
+|HellaSwag (10-Shot)              |27.99|
+|MMLU (5-Shot)                    |23.95|
+|TruthfulQA (0-shot)              |42.10|
+|Winogrande (5-shot)              |50.36|
+|GSM8k (5-shot)                   | 0.00|