leaderboard-pt-pr-bot commited on
Commit
f83e9a9
1 Parent(s): 4cd7f88

Adding the Open Portuguese LLM Leaderboard Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard

The purpose of this PR is to add evaluation results from the Open Portuguese LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard/discussions

Files changed (1) hide show
  1. README.md +170 -4
README.md CHANGED
@@ -1,16 +1,163 @@
1
  ---
 
 
 
2
  library_name: transformers
3
  tags:
4
  - mixtral
5
  - portuguese
6
  - portugues
7
- license: apache-2.0
8
  datasets:
9
  - rhaymison/superset
10
- language:
11
- - pt
12
  pipeline_tag: text-generation
13
- base_model: mistralai/Mixtral-8x7B-Instruct-v0.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  ---
15
 
16
  # Mistral-8x7b-portuguese-luana
@@ -118,3 +265,22 @@ email: [email protected]
118
  <img src="https://img.shields.io/badge/GitHub-100000?style=for-the-badge&logo=github&logoColor=white">
119
  </a>
120
  </div>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - pt
4
+ license: apache-2.0
5
  library_name: transformers
6
  tags:
7
  - mixtral
8
  - portuguese
9
  - portugues
10
+ base_model: mistralai/Mixtral-8x7B-Instruct-v0.1
11
  datasets:
12
  - rhaymison/superset
 
 
13
  pipeline_tag: text-generation
14
+ model-index:
15
+ - name: Mistral-8x7b-portuguese-luana
16
+ results:
17
+ - task:
18
+ type: text-generation
19
+ name: Text Generation
20
+ dataset:
21
+ name: ENEM Challenge (No Images)
22
+ type: eduagarcia/enem_challenge
23
+ split: train
24
+ args:
25
+ num_few_shot: 3
26
+ metrics:
27
+ - type: acc
28
+ value: 69.63
29
+ name: accuracy
30
+ source:
31
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/Mistral-8x7b-portuguese-luana
32
+ name: Open Portuguese LLM Leaderboard
33
+ - task:
34
+ type: text-generation
35
+ name: Text Generation
36
+ dataset:
37
+ name: BLUEX (No Images)
38
+ type: eduagarcia-temp/BLUEX_without_images
39
+ split: train
40
+ args:
41
+ num_few_shot: 3
42
+ metrics:
43
+ - type: acc
44
+ value: 59.11
45
+ name: accuracy
46
+ source:
47
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/Mistral-8x7b-portuguese-luana
48
+ name: Open Portuguese LLM Leaderboard
49
+ - task:
50
+ type: text-generation
51
+ name: Text Generation
52
+ dataset:
53
+ name: OAB Exams
54
+ type: eduagarcia/oab_exams
55
+ split: train
56
+ args:
57
+ num_few_shot: 3
58
+ metrics:
59
+ - type: acc
60
+ value: 49.61
61
+ name: accuracy
62
+ source:
63
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/Mistral-8x7b-portuguese-luana
64
+ name: Open Portuguese LLM Leaderboard
65
+ - task:
66
+ type: text-generation
67
+ name: Text Generation
68
+ dataset:
69
+ name: Assin2 RTE
70
+ type: assin2
71
+ split: test
72
+ args:
73
+ num_few_shot: 15
74
+ metrics:
75
+ - type: f1_macro
76
+ value: 61.21
77
+ name: f1-macro
78
+ source:
79
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/Mistral-8x7b-portuguese-luana
80
+ name: Open Portuguese LLM Leaderboard
81
+ - task:
82
+ type: text-generation
83
+ name: Text Generation
84
+ dataset:
85
+ name: Assin2 STS
86
+ type: eduagarcia/portuguese_benchmark
87
+ split: test
88
+ args:
89
+ num_few_shot: 15
90
+ metrics:
91
+ - type: pearson
92
+ value: 79.95
93
+ name: pearson
94
+ source:
95
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/Mistral-8x7b-portuguese-luana
96
+ name: Open Portuguese LLM Leaderboard
97
+ - task:
98
+ type: text-generation
99
+ name: Text Generation
100
+ dataset:
101
+ name: FaQuAD NLI
102
+ type: ruanchaves/faquad-nli
103
+ split: test
104
+ args:
105
+ num_few_shot: 15
106
+ metrics:
107
+ - type: f1_macro
108
+ value: 78.6
109
+ name: f1-macro
110
+ source:
111
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/Mistral-8x7b-portuguese-luana
112
+ name: Open Portuguese LLM Leaderboard
113
+ - task:
114
+ type: text-generation
115
+ name: Text Generation
116
+ dataset:
117
+ name: HateBR Binary
118
+ type: ruanchaves/hatebr
119
+ split: test
120
+ args:
121
+ num_few_shot: 25
122
+ metrics:
123
+ - type: f1_macro
124
+ value: 72.42
125
+ name: f1-macro
126
+ source:
127
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/Mistral-8x7b-portuguese-luana
128
+ name: Open Portuguese LLM Leaderboard
129
+ - task:
130
+ type: text-generation
131
+ name: Text Generation
132
+ dataset:
133
+ name: PT Hate Speech Binary
134
+ type: hate_speech_portuguese
135
+ split: test
136
+ args:
137
+ num_few_shot: 25
138
+ metrics:
139
+ - type: f1_macro
140
+ value: 73.01
141
+ name: f1-macro
142
+ source:
143
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/Mistral-8x7b-portuguese-luana
144
+ name: Open Portuguese LLM Leaderboard
145
+ - task:
146
+ type: text-generation
147
+ name: Text Generation
148
+ dataset:
149
+ name: tweetSentBR
150
+ type: eduagarcia/tweetsentbr_fewshot
151
+ split: test
152
+ args:
153
+ num_few_shot: 25
154
+ metrics:
155
+ - type: f1_macro
156
+ value: 50.9
157
+ name: f1-macro
158
+ source:
159
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/Mistral-8x7b-portuguese-luana
160
+ name: Open Portuguese LLM Leaderboard
161
  ---
162
 
163
  # Mistral-8x7b-portuguese-luana
 
265
  <img src="https://img.shields.io/badge/GitHub-100000?style=for-the-badge&logo=github&logoColor=white">
266
  </a>
267
  </div>
268
+
269
+
270
+ # Open Portuguese LLM Leaderboard Evaluation Results
271
+
272
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/rhaymison/Mistral-8x7b-portuguese-luana) and on the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
273
+
274
+ | Metric | Value |
275
+ |--------------------------|---------|
276
+ |Average |**66.05**|
277
+ |ENEM Challenge (No Images)| 69.63|
278
+ |BLUEX (No Images) | 59.11|
279
+ |OAB Exams | 49.61|
280
+ |Assin2 RTE | 61.21|
281
+ |Assin2 STS | 79.95|
282
+ |FaQuAD NLI | 78.60|
283
+ |HateBR Binary | 72.42|
284
+ |PT Hate Speech Binary | 73.01|
285
+ |tweetSentBR | 50.90|
286
+