Crystalcareai commited on
Commit
e2325db
·
verified ·
1 Parent(s): b491aa4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -102
README.md CHANGED
@@ -1,100 +1,5 @@
1
  ---
2
- license: llama3.1
3
- model-index:
4
- - name: Llama-Spark
5
- results:
6
- - task:
7
- type: text-generation
8
- name: Text Generation
9
- dataset:
10
- name: IFEval (0-Shot)
11
- type: HuggingFaceH4/ifeval
12
- args:
13
- num_few_shot: 0
14
- metrics:
15
- - type: inst_level_strict_acc and prompt_level_strict_acc
16
- value: 79.11
17
- name: strict accuracy
18
- source:
19
- url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=arcee-ai/Llama-Spark
20
- name: Open LLM Leaderboard
21
- - task:
22
- type: text-generation
23
- name: Text Generation
24
- dataset:
25
- name: BBH (3-Shot)
26
- type: BBH
27
- args:
28
- num_few_shot: 3
29
- metrics:
30
- - type: acc_norm
31
- value: 29.77
32
- name: normalized accuracy
33
- source:
34
- url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=arcee-ai/Llama-Spark
35
- name: Open LLM Leaderboard
36
- - task:
37
- type: text-generation
38
- name: Text Generation
39
- dataset:
40
- name: MATH Lvl 5 (4-Shot)
41
- type: hendrycks/competition_math
42
- args:
43
- num_few_shot: 4
44
- metrics:
45
- - type: exact_match
46
- value: 1.06
47
- name: exact match
48
- source:
49
- url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=arcee-ai/Llama-Spark
50
- name: Open LLM Leaderboard
51
- - task:
52
- type: text-generation
53
- name: Text Generation
54
- dataset:
55
- name: GPQA (0-shot)
56
- type: Idavidrein/gpqa
57
- args:
58
- num_few_shot: 0
59
- metrics:
60
- - type: acc_norm
61
- value: 6.6
62
- name: acc_norm
63
- source:
64
- url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=arcee-ai/Llama-Spark
65
- name: Open LLM Leaderboard
66
- - task:
67
- type: text-generation
68
- name: Text Generation
69
- dataset:
70
- name: MuSR (0-shot)
71
- type: TAUR-Lab/MuSR
72
- args:
73
- num_few_shot: 0
74
- metrics:
75
- - type: acc_norm
76
- value: 2.62
77
- name: acc_norm
78
- source:
79
- url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=arcee-ai/Llama-Spark
80
- name: Open LLM Leaderboard
81
- - task:
82
- type: text-generation
83
- name: Text Generation
84
- dataset:
85
- name: MMLU-PRO (5-shot)
86
- type: TIGER-Lab/MMLU-Pro
87
- config: main
88
- split: test
89
- args:
90
- num_few_shot: 5
91
- metrics:
92
- - type: acc
93
- value: 30.23
94
- name: accuracy
95
- source:
96
- url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=arcee-ai/Llama-Spark
97
- name: Open LLM Leaderboard
98
  ---
99
  <div align="center">
100
  <img src="https://i.ibb.co/9hwFrvL/BLMs-Wkx-NQf-W-46-FZDg-ILhg.jpg" alt="Arcee Spark" style="border-radius: 10px; box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19); max-width: 100%; height: auto;">
@@ -119,10 +24,16 @@ Llama-Spark is intended for use in conversational AI applications, such as chatb
119
 
120
  Llama-Spark is built upon the Llama-3.1-8B base model, fine-tuned using of the Tome Dataset and merged with Llama-3.1-8B-Instruct.
121
  ## Evaluation Results
122
- Please note that these scores are consistantly higher than the OpenLLM leaderboard, and should be compared to their relative performance increase not weighed against the leaderboard.
123
- <div align="center">
124
- <img src="https://i.ibb.co/pfSGLtB/Screenshot-2024-08-01-at-11-40-42-PM.png" alt="Arcee Spark" style="border-radius: 10px; box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19); max-width: 100%; height: auto;">
125
- </div>
 
 
 
 
 
 
126
 
127
  ## Acknowledgements
128
 
@@ -139,5 +50,4 @@ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-le
139
  |MATH Lvl 5 (4-Shot)| 1.06|
140
  |GPQA (0-shot) | 6.60|
141
  |MuSR (0-shot) | 2.62|
142
- |MMLU-PRO (5-shot) |30.23|
143
-
 
1
  ---
2
+ license: llama3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
4
  <div align="center">
5
  <img src="https://i.ibb.co/9hwFrvL/BLMs-Wkx-NQf-W-46-FZDg-ILhg.jpg" alt="Arcee Spark" style="border-radius: 10px; box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19); max-width: 100%; height: auto;">
 
24
 
25
  Llama-Spark is built upon the Llama-3.1-8B base model, fine-tuned using of the Tome Dataset and merged with Llama-3.1-8B-Instruct.
26
  ## Evaluation Results
27
+ We have removed our initial benchmark results and replaced them with the results from the OpenLLM Leaderboard, which are much more deterministic:
28
+ | Metric |Value|
29
+ |-------------------|----:|
30
+ |Avg. |24.90|
31
+ |IFEval (0-Shot) |79.11|
32
+ |BBH (3-Shot) |29.77|
33
+ |MATH Lvl 5 (4-Shot)| 1.06|
34
+ |GPQA (0-shot) | 6.60|
35
+ |MuSR (0-shot) | 2.62|
36
+ |MMLU-PRO (5-shot) |30.23|
37
 
38
  ## Acknowledgements
39
 
 
50
  |MATH Lvl 5 (4-Shot)| 1.06|
51
  |GPQA (0-shot) | 6.60|
52
  |MuSR (0-shot) | 2.62|
53
+ |MMLU-PRO (5-shot) |30.23|