leaderboard-pr-bot commited on
Commit
df14a5e
1 Parent(s): 59f6f37

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +106 -0
README.md CHANGED
@@ -108,6 +108,98 @@ model-index:
108
  source:
109
  url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
110
  name: Open LLM Leaderboard
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
111
  ---
112
  <div align="center">
113
 
@@ -152,3 +244,17 @@ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-le
152
  |Winogrande (5-shot) |59.51|
153
  |GSM8k (5-shot) | 1.44|
154
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
108
  source:
109
  url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
110
  name: Open LLM Leaderboard
111
+ - task:
112
+ type: text-generation
113
+ name: Text Generation
114
+ dataset:
115
+ name: IFEval (0-Shot)
116
+ type: HuggingFaceH4/ifeval
117
+ args:
118
+ num_few_shot: 0
119
+ metrics:
120
+ - type: inst_level_strict_acc and prompt_level_strict_acc
121
+ value: 22.77
122
+ name: strict accuracy
123
+ source:
124
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
125
+ name: Open LLM Leaderboard
126
+ - task:
127
+ type: text-generation
128
+ name: Text Generation
129
+ dataset:
130
+ name: BBH (3-Shot)
131
+ type: BBH
132
+ args:
133
+ num_few_shot: 3
134
+ metrics:
135
+ - type: acc_norm
136
+ value: 3.55
137
+ name: normalized accuracy
138
+ source:
139
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
140
+ name: Open LLM Leaderboard
141
+ - task:
142
+ type: text-generation
143
+ name: Text Generation
144
+ dataset:
145
+ name: MATH Lvl 5 (4-Shot)
146
+ type: hendrycks/competition_math
147
+ args:
148
+ num_few_shot: 4
149
+ metrics:
150
+ - type: exact_match
151
+ value: 0.83
152
+ name: exact match
153
+ source:
154
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
155
+ name: Open LLM Leaderboard
156
+ - task:
157
+ type: text-generation
158
+ name: Text Generation
159
+ dataset:
160
+ name: GPQA (0-shot)
161
+ type: Idavidrein/gpqa
162
+ args:
163
+ num_few_shot: 0
164
+ metrics:
165
+ - type: acc_norm
166
+ value: 0.34
167
+ name: acc_norm
168
+ source:
169
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
170
+ name: Open LLM Leaderboard
171
+ - task:
172
+ type: text-generation
173
+ name: Text Generation
174
+ dataset:
175
+ name: MuSR (0-shot)
176
+ type: TAUR-Lab/MuSR
177
+ args:
178
+ num_few_shot: 0
179
+ metrics:
180
+ - type: acc_norm
181
+ value: 2.19
182
+ name: acc_norm
183
+ source:
184
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
185
+ name: Open LLM Leaderboard
186
+ - task:
187
+ type: text-generation
188
+ name: Text Generation
189
+ dataset:
190
+ name: MMLU-PRO (5-shot)
191
+ type: TIGER-Lab/MMLU-Pro
192
+ config: main
193
+ split: test
194
+ args:
195
+ num_few_shot: 5
196
+ metrics:
197
+ - type: acc
198
+ value: 1.34
199
+ name: accuracy
200
+ source:
201
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
202
+ name: Open LLM Leaderboard
203
  ---
204
  <div align="center">
205
 
 
244
  |Winogrande (5-shot) |59.51|
245
  |GSM8k (5-shot) | 1.44|
246
 
247
+
248
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
249
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_TinyLlama__TinyLlama-1.1B-intermediate-step-1431k-3T)
250
+
251
+ | Metric |Value|
252
+ |-------------------|----:|
253
+ |Avg. | 5.17|
254
+ |IFEval (0-Shot) |22.77|
255
+ |BBH (3-Shot) | 3.55|
256
+ |MATH Lvl 5 (4-Shot)| 0.83|
257
+ |GPQA (0-shot) | 0.34|
258
+ |MuSR (0-shot) | 2.19|
259
+ |MMLU-PRO (5-shot) | 1.34|
260
+