leaderboard-pr-bot commited on
Commit
ee6eb62
1 Parent(s): feb7ef4

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +118 -2
README.md CHANGED
@@ -1,9 +1,112 @@
1
  ---
2
  language:
3
  - en
4
- pipeline_tag: text-classification
5
  tags:
6
  - llama-2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  ---
8
  This is a Llama-2 version of [Guanaco](https://huggingface.co/timdettmers/guanaco-13b). It was finetuned from the base [Llama-13b](https://huggingface.co/meta-llama/Llama-2-13b-hf) model using the official training scripts found in the [QLoRA repo](https://github.com/artidoro/qlora). I wanted it to be as faithful as possible and therefore changed nothing in the training script beyond the model it was pointing to. The model prompt is therefore also the same as the original Guanaco model.
9
 
@@ -11,4 +114,17 @@ This repo contains the merged f16 model. The QLoRA adaptor can be found [here](h
11
 
12
  A 7b version of the model can be found [here](https://huggingface.co/Mikael110/llama-2-7b-guanaco-fp16).
13
 
14
- **Legal Disclaimer: This model is bound by the usage restrictions of the original Llama-2 model. And comes with no warranty or gurantees of any kind.**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  language:
3
  - en
 
4
  tags:
5
  - llama-2
6
+ pipeline_tag: text-classification
7
+ model-index:
8
+ - name: llama-2-13b-guanaco-fp16
9
+ results:
10
+ - task:
11
+ type: text-generation
12
+ name: Text Generation
13
+ dataset:
14
+ name: AI2 Reasoning Challenge (25-Shot)
15
+ type: ai2_arc
16
+ config: ARC-Challenge
17
+ split: test
18
+ args:
19
+ num_few_shot: 25
20
+ metrics:
21
+ - type: acc_norm
22
+ value: 60.92
23
+ name: normalized accuracy
24
+ source:
25
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Mikael110/llama-2-13b-guanaco-fp16
26
+ name: Open LLM Leaderboard
27
+ - task:
28
+ type: text-generation
29
+ name: Text Generation
30
+ dataset:
31
+ name: HellaSwag (10-Shot)
32
+ type: hellaswag
33
+ split: validation
34
+ args:
35
+ num_few_shot: 10
36
+ metrics:
37
+ - type: acc_norm
38
+ value: 83.18
39
+ name: normalized accuracy
40
+ source:
41
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Mikael110/llama-2-13b-guanaco-fp16
42
+ name: Open LLM Leaderboard
43
+ - task:
44
+ type: text-generation
45
+ name: Text Generation
46
+ dataset:
47
+ name: MMLU (5-Shot)
48
+ type: cais/mmlu
49
+ config: all
50
+ split: test
51
+ args:
52
+ num_few_shot: 5
53
+ metrics:
54
+ - type: acc
55
+ value: 54.58
56
+ name: accuracy
57
+ source:
58
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Mikael110/llama-2-13b-guanaco-fp16
59
+ name: Open LLM Leaderboard
60
+ - task:
61
+ type: text-generation
62
+ name: Text Generation
63
+ dataset:
64
+ name: TruthfulQA (0-shot)
65
+ type: truthful_qa
66
+ config: multiple_choice
67
+ split: validation
68
+ args:
69
+ num_few_shot: 0
70
+ metrics:
71
+ - type: mc2
72
+ value: 44.0
73
+ source:
74
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Mikael110/llama-2-13b-guanaco-fp16
75
+ name: Open LLM Leaderboard
76
+ - task:
77
+ type: text-generation
78
+ name: Text Generation
79
+ dataset:
80
+ name: Winogrande (5-shot)
81
+ type: winogrande
82
+ config: winogrande_xl
83
+ split: validation
84
+ args:
85
+ num_few_shot: 5
86
+ metrics:
87
+ - type: acc
88
+ value: 74.9
89
+ name: accuracy
90
+ source:
91
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Mikael110/llama-2-13b-guanaco-fp16
92
+ name: Open LLM Leaderboard
93
+ - task:
94
+ type: text-generation
95
+ name: Text Generation
96
+ dataset:
97
+ name: GSM8k (5-shot)
98
+ type: gsm8k
99
+ config: main
100
+ split: test
101
+ args:
102
+ num_few_shot: 5
103
+ metrics:
104
+ - type: acc
105
+ value: 11.6
106
+ name: accuracy
107
+ source:
108
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Mikael110/llama-2-13b-guanaco-fp16
109
+ name: Open LLM Leaderboard
110
  ---
111
  This is a Llama-2 version of [Guanaco](https://huggingface.co/timdettmers/guanaco-13b). It was finetuned from the base [Llama-13b](https://huggingface.co/meta-llama/Llama-2-13b-hf) model using the official training scripts found in the [QLoRA repo](https://github.com/artidoro/qlora). I wanted it to be as faithful as possible and therefore changed nothing in the training script beyond the model it was pointing to. The model prompt is therefore also the same as the original Guanaco model.
112
 
 
114
 
115
  A 7b version of the model can be found [here](https://huggingface.co/Mikael110/llama-2-7b-guanaco-fp16).
116
 
117
+ **Legal Disclaimer: This model is bound by the usage restrictions of the original Llama-2 model. And comes with no warranty or gurantees of any kind.**
118
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
119
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Mikael110__llama-2-13b-guanaco-fp16)
120
+
121
+ | Metric |Value|
122
+ |---------------------------------|----:|
123
+ |Avg. |54.86|
124
+ |AI2 Reasoning Challenge (25-Shot)|60.92|
125
+ |HellaSwag (10-Shot) |83.18|
126
+ |MMLU (5-Shot) |54.58|
127
+ |TruthfulQA (0-shot) |44.00|
128
+ |Winogrande (5-shot) |74.90|
129
+ |GSM8k (5-shot) |11.60|
130
+