pszemraj leaderboard-pr-bot commited on
Commit
bb0773e
1 Parent(s): 6f9ad47

Adding Evaluation Results (#3)

Browse files

- Adding Evaluation Results (1362734727838f5dbcf61a6fdbf5f9bf02a2b115)


Co-authored-by: Open LLM Leaderboard PR Bot <[email protected]>

Files changed (1) hide show
  1. README.md +32 -25
README.md CHANGED
@@ -1,38 +1,34 @@
1
  ---
 
 
2
  license: apache-2.0
 
3
  tags:
4
  - generated_from_trainer
5
  - chatgpt
6
  - HC3
 
 
7
  metrics:
8
  - accuracy
9
- model-index:
10
- - name: distilgpt2-HC3
11
- results: []
12
  widget:
13
- - text: >-
14
- Review: Best cast iron skillet you will ever buy. Is this review positive or
15
- negative? <answer>
16
  example_title: Sentiment analysis
17
- - text: >-
18
- Barack Obama nominated Hilary Clinton as his secretary of state on Monday.
19
  He chose her because <answer>
20
  example_title: Coreference resolution
21
- - text: >-
22
- On a shelf, there are five books: a gray book, a red book, a purple book, a
23
- blue book, and a black book. Here's the puzzle, <answer>
24
  example_title: Logic puzzles
25
- - text: >-
26
- The two men running to become New York City's next mayor will face off in
27
  their first debate Wednesday night <answer>
28
  example_title: Reading comprehension
29
- - text: >-
30
- Is it true that if I have five 5-hour energy drinks in a single 24-hour
31
- period, I get 25 hours of energy and spontaneously explode? <answer>
32
  example_title: 5 hour energy
33
- - text: >-
34
- what happens if you train a smaller model on a dataset of
35
- reinforcement-learning optimized model responses? <answer>
36
  example_title: deep learning advice
37
  inference:
38
  parameters:
@@ -42,12 +38,10 @@ inference:
42
  repetition_penalty: 1.5
43
  eta_cutoff: 0.0008
44
  renormalize_logits: true
45
- datasets:
46
- - pszemraj/HC3-textgen-qa
47
- language:
48
- - en
49
- library_name: transformers
50
  pipeline_tag: text-generation
 
 
 
51
  ---
52
 
53
 
@@ -117,4 +111,17 @@ The following hyperparameters were used during training:
117
  - Transformers 4.27.0.dev0
118
  - Pytorch 1.11.0+cu113
119
  - Datasets 2.6.1
120
- - Tokenizers 0.12.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
  license: apache-2.0
5
+ library_name: transformers
6
  tags:
7
  - generated_from_trainer
8
  - chatgpt
9
  - HC3
10
+ datasets:
11
+ - pszemraj/HC3-textgen-qa
12
  metrics:
13
  - accuracy
 
 
 
14
  widget:
15
+ - text: 'Review: Best cast iron skillet you will ever buy. Is this review positive
16
+ or negative? <answer>'
 
17
  example_title: Sentiment analysis
18
+ - text: Barack Obama nominated Hilary Clinton as his secretary of state on Monday.
 
19
  He chose her because <answer>
20
  example_title: Coreference resolution
21
+ - text: 'On a shelf, there are five books: a gray book, a red book, a purple book,
22
+ a blue book, and a black book. Here''s the puzzle, <answer>'
 
23
  example_title: Logic puzzles
24
+ - text: The two men running to become New York City's next mayor will face off in
 
25
  their first debate Wednesday night <answer>
26
  example_title: Reading comprehension
27
+ - text: Is it true that if I have five 5-hour energy drinks in a single 24-hour period,
28
+ I get 25 hours of energy and spontaneously explode? <answer>
 
29
  example_title: 5 hour energy
30
+ - text: what happens if you train a smaller model on a dataset of reinforcement-learning
31
+ optimized model responses? <answer>
 
32
  example_title: deep learning advice
33
  inference:
34
  parameters:
 
38
  repetition_penalty: 1.5
39
  eta_cutoff: 0.0008
40
  renormalize_logits: true
 
 
 
 
 
41
  pipeline_tag: text-generation
42
+ model-index:
43
+ - name: distilgpt2-HC3
44
+ results: []
45
  ---
46
 
47
 
 
111
  - Transformers 4.27.0.dev0
112
  - Pytorch 1.11.0+cu113
113
  - Datasets 2.6.1
114
+ - Tokenizers 0.12.1
115
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
116
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_pszemraj__distilgpt2-HC3)
117
+
118
+ | Metric |Value|
119
+ |---------------------------------|----:|
120
+ |Avg. |28.18|
121
+ |AI2 Reasoning Challenge (25-Shot)|24.66|
122
+ |HellaSwag (10-Shot) |27.99|
123
+ |MMLU (5-Shot) |23.95|
124
+ |TruthfulQA (0-shot) |42.10|
125
+ |Winogrande (5-shot) |50.36|
126
+ |GSM8k (5-shot) | 0.00|
127
+