Adding Evaluation Results
#6
by
leaderboard-pr-bot
- opened
README.md
CHANGED
@@ -310,3 +310,17 @@ The `first_n` function takes an integer `n` as input, and calculates the first n
|
|
310 |
fiddled with his brakes." The salesman quips, "And I'll have a martini, shaken not stirred. After all, I have to sell this guy a car that doesn't break down on him within the first year of ownership."
|
311 |
```
|
312 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
310 |
fiddled with his brakes." The salesman quips, "And I'll have a martini, shaken not stirred. After all, I have to sell this guy a car that doesn't break down on him within the first year of ownership."
|
311 |
```
|
312 |
|
313 |
+
|
314 |
+
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
315 |
+
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_TheBloke__wizard-mega-13B-GPTQ)
|
316 |
+
|
317 |
+
| Metric | Value |
|
318 |
+
|-----------------------|---------------------------|
|
319 |
+
| Avg. | 31.08 |
|
320 |
+
| ARC (25-shot) | 27.73 |
|
321 |
+
| HellaSwag (10-shot) | 26.01 |
|
322 |
+
| MMLU (5-shot) | 24.97 |
|
323 |
+
| TruthfulQA (0-shot) | 48.69 |
|
324 |
+
| Winogrande (5-shot) | 74.74 |
|
325 |
+
| GSM8K (5-shot) | 8.95 |
|
326 |
+
| DROP (3-shot) | 6.48 |
|