Adding Evaluation Results

#2
Files changed (1) hide show
  1. README.md +20 -13
README.md CHANGED
@@ -16,8 +16,7 @@ model-index:
16
  value: 48.13
17
  name: strict accuracy
18
  source:
19
- url: >-
20
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst_v1.1
21
  name: Open LLM Leaderboard
22
  - task:
23
  type: text-generation
@@ -32,8 +31,7 @@ model-index:
32
  value: 5.19
33
  name: normalized accuracy
34
  source:
35
- url: >-
36
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst_v1.1
37
  name: Open LLM Leaderboard
38
  - task:
39
  type: text-generation
@@ -48,8 +46,7 @@ model-index:
48
  value: 1.36
49
  name: exact match
50
  source:
51
- url: >-
52
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst_v1.1
53
  name: Open LLM Leaderboard
54
  - task:
55
  type: text-generation
@@ -64,8 +61,7 @@ model-index:
64
  value: 2.35
65
  name: acc_norm
66
  source:
67
- url: >-
68
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst_v1.1
69
  name: Open LLM Leaderboard
70
  - task:
71
  type: text-generation
@@ -80,8 +76,7 @@ model-index:
80
  value: 4.05
81
  name: acc_norm
82
  source:
83
- url: >-
84
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst_v1.1
85
  name: Open LLM Leaderboard
86
  - task:
87
  type: text-generation
@@ -98,8 +93,7 @@ model-index:
98
  value: 3.05
99
  name: accuracy
100
  source:
101
- url: >-
102
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst_v1.1
103
  name: Open LLM Leaderboard
104
  ---
105
 
@@ -308,4 +302,17 @@ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-le
308
  |MATH Lvl 5 (4-Shot)| 1.36|
309
  |GPQA (0-shot) | 2.35|
310
  |MuSR (0-shot) | 4.05|
311
- |MMLU-PRO (5-shot) | 3.05|
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  value: 48.13
17
  name: strict accuracy
18
  source:
19
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst_v1.1
 
20
  name: Open LLM Leaderboard
21
  - task:
22
  type: text-generation
 
31
  value: 5.19
32
  name: normalized accuracy
33
  source:
34
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst_v1.1
 
35
  name: Open LLM Leaderboard
36
  - task:
37
  type: text-generation
 
46
  value: 1.36
47
  name: exact match
48
  source:
49
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst_v1.1
 
50
  name: Open LLM Leaderboard
51
  - task:
52
  type: text-generation
 
61
  value: 2.35
62
  name: acc_norm
63
  source:
64
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst_v1.1
 
65
  name: Open LLM Leaderboard
66
  - task:
67
  type: text-generation
 
76
  value: 4.05
77
  name: acc_norm
78
  source:
79
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst_v1.1
 
80
  name: Open LLM Leaderboard
81
  - task:
82
  type: text-generation
 
93
  value: 3.05
94
  name: accuracy
95
  source:
96
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst_v1.1
 
97
  name: Open LLM Leaderboard
98
  ---
99
 
 
302
  |MATH Lvl 5 (4-Shot)| 1.36|
303
  |GPQA (0-shot) | 2.35|
304
  |MuSR (0-shot) | 4.05|
305
+ |MMLU-PRO (5-shot) | 3.05|
306
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
307
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_DeepAutoAI__Explore_Llama-3.2-1B-Inst_v1.1)
308
+
309
+ | Metric |Value|
310
+ |-------------------|----:|
311
+ |Avg. |14.12|
312
+ |IFEval (0-Shot) |58.44|
313
+ |BBH (3-Shot) | 8.82|
314
+ |MATH Lvl 5 (4-Shot)| 6.04|
315
+ |GPQA (0-shot) | 1.68|
316
+ |MuSR (0-shot) | 0.66|
317
+ |MMLU-PRO (5-shot) | 9.09|
318
+