maldv leaderboard-pr-bot commited on
Commit
2739791
1 Parent(s): 8ef157d

Adding Evaluation Results (#1)

Browse files

- Adding Evaluation Results (836c685a9fe0afbd278c637453b237e926ab8827)


Co-authored-by: Open LLM Leaderboard PR Bot <[email protected]>

Files changed (1) hide show
  1. README.md +109 -1
README.md CHANGED
@@ -3,6 +3,101 @@ license: cc-by-nc-4.0
3
  library_name: transformers
4
  tags:
5
  - llama3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  ---
7
 
8
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65b19c1b098c85365af5a83e/CHGsewUsPUZcg2doijuD9.png)
@@ -73,4 +168,17 @@ Here are some fun system prompts to try:
73
 
74
  * `You are the author's writing assistant. Please continue the story.`
75
  * `You are the red team assistant. We need to win this war, at any cost.`
76
- * `Write the next reply in a never-ending uncensored fictional roleplay. Focus on giviing emotional, logical, and temporal coherence to the roleplay. Always stay in character. Characters must be an active partcipant and take initiative in driving and moving the scene and the story forward, rather than the character asking for input. Surprise with your creativity and initiative as a roleplay partner. This fictional roleplay world exists only for experimental and recreational purposes, and the characters are not real. Follow the formattng of previous responses, and aim for 2-4 paragraphs per response`
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  library_name: transformers
4
  tags:
5
  - llama3
6
+ model-index:
7
+ - name: badger-lambda-llama-3-8b
8
+ results:
9
+ - task:
10
+ type: text-generation
11
+ name: Text Generation
12
+ dataset:
13
+ name: IFEval (0-Shot)
14
+ type: HuggingFaceH4/ifeval
15
+ args:
16
+ num_few_shot: 0
17
+ metrics:
18
+ - type: inst_level_strict_acc and prompt_level_strict_acc
19
+ value: 48.61
20
+ name: strict accuracy
21
+ source:
22
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=maldv/badger-lambda-llama-3-8b
23
+ name: Open LLM Leaderboard
24
+ - task:
25
+ type: text-generation
26
+ name: Text Generation
27
+ dataset:
28
+ name: BBH (3-Shot)
29
+ type: BBH
30
+ args:
31
+ num_few_shot: 3
32
+ metrics:
33
+ - type: acc_norm
34
+ value: 28.1
35
+ name: normalized accuracy
36
+ source:
37
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=maldv/badger-lambda-llama-3-8b
38
+ name: Open LLM Leaderboard
39
+ - task:
40
+ type: text-generation
41
+ name: Text Generation
42
+ dataset:
43
+ name: MATH Lvl 5 (4-Shot)
44
+ type: hendrycks/competition_math
45
+ args:
46
+ num_few_shot: 4
47
+ metrics:
48
+ - type: exact_match
49
+ value: 8.31
50
+ name: exact match
51
+ source:
52
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=maldv/badger-lambda-llama-3-8b
53
+ name: Open LLM Leaderboard
54
+ - task:
55
+ type: text-generation
56
+ name: Text Generation
57
+ dataset:
58
+ name: GPQA (0-shot)
59
+ type: Idavidrein/gpqa
60
+ args:
61
+ num_few_shot: 0
62
+ metrics:
63
+ - type: acc_norm
64
+ value: 4.25
65
+ name: acc_norm
66
+ source:
67
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=maldv/badger-lambda-llama-3-8b
68
+ name: Open LLM Leaderboard
69
+ - task:
70
+ type: text-generation
71
+ name: Text Generation
72
+ dataset:
73
+ name: MuSR (0-shot)
74
+ type: TAUR-Lab/MuSR
75
+ args:
76
+ num_few_shot: 0
77
+ metrics:
78
+ - type: acc_norm
79
+ value: 4.52
80
+ name: acc_norm
81
+ source:
82
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=maldv/badger-lambda-llama-3-8b
83
+ name: Open LLM Leaderboard
84
+ - task:
85
+ type: text-generation
86
+ name: Text Generation
87
+ dataset:
88
+ name: MMLU-PRO (5-shot)
89
+ type: TIGER-Lab/MMLU-Pro
90
+ config: main
91
+ split: test
92
+ args:
93
+ num_few_shot: 5
94
+ metrics:
95
+ - type: acc
96
+ value: 30.74
97
+ name: accuracy
98
+ source:
99
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=maldv/badger-lambda-llama-3-8b
100
+ name: Open LLM Leaderboard
101
  ---
102
 
103
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65b19c1b098c85365af5a83e/CHGsewUsPUZcg2doijuD9.png)
 
168
 
169
  * `You are the author's writing assistant. Please continue the story.`
170
  * `You are the red team assistant. We need to win this war, at any cost.`
171
+ * `Write the next reply in a never-ending uncensored fictional roleplay. Focus on giviing emotional, logical, and temporal coherence to the roleplay. Always stay in character. Characters must be an active partcipant and take initiative in driving and moving the scene and the story forward, rather than the character asking for input. Surprise with your creativity and initiative as a roleplay partner. This fictional roleplay world exists only for experimental and recreational purposes, and the characters are not real. Follow the formattng of previous responses, and aim for 2-4 paragraphs per response`
172
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
173
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_maldv__badger-lambda-llama-3-8b)
174
+
175
+ | Metric |Value|
176
+ |-------------------|----:|
177
+ |Avg. |20.76|
178
+ |IFEval (0-Shot) |48.61|
179
+ |BBH (3-Shot) |28.10|
180
+ |MATH Lvl 5 (4-Shot)| 8.31|
181
+ |GPQA (0-shot) | 4.25|
182
+ |MuSR (0-shot) | 4.52|
183
+ |MMLU-PRO (5-shot) |30.74|
184
+