leaderboard-pt-pr-bot commited on
Commit
91dfcdb
·
verified ·
1 Parent(s): 86c4ccc

Adding the Open Portuguese LLM Leaderboard Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard

The purpose of this PR is to add evaluation results from the Open Portuguese LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard/discussions

Files changed (1) hide show
  1. README.md +139 -1
README.md CHANGED
@@ -1,8 +1,129 @@
1
  ---
2
- license: other
3
  language:
4
  - pt
5
  - en
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  ---
7
 
8
  **Conheça os nossos outros modelos (bem melhores): [Cabra](https://huggingface.co/collections/botbot-ai/models-6604c2069ceef04f834ba99b)**
@@ -13,3 +134,20 @@ O Cabra 13b é um qlora finetune do [LLaMA 2 13b Chat](https://huggingface.co/me
13
 
14
  O modelo precisa de mais treinamento, e pode gerar mentira ou inverdades.
15
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  language:
3
  - pt
4
  - en
5
+ license: other
6
+ model-index:
7
+ - name: cabra13b
8
+ results:
9
+ - task:
10
+ type: text-generation
11
+ name: Text Generation
12
+ dataset:
13
+ name: ENEM Challenge (No Images)
14
+ type: eduagarcia/enem_challenge
15
+ split: train
16
+ args:
17
+ num_few_shot: 3
18
+ metrics:
19
+ - type: acc
20
+ value: 48.85
21
+ name: accuracy
22
+ source:
23
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicolasdec/cabra13b
24
+ name: Open Portuguese LLM Leaderboard
25
+ - task:
26
+ type: text-generation
27
+ name: Text Generation
28
+ dataset:
29
+ name: BLUEX (No Images)
30
+ type: eduagarcia-temp/BLUEX_without_images
31
+ split: train
32
+ args:
33
+ num_few_shot: 3
34
+ metrics:
35
+ - type: acc
36
+ value: 40.89
37
+ name: accuracy
38
+ source:
39
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicolasdec/cabra13b
40
+ name: Open Portuguese LLM Leaderboard
41
+ - task:
42
+ type: text-generation
43
+ name: Text Generation
44
+ dataset:
45
+ name: OAB Exams
46
+ type: eduagarcia/oab_exams
47
+ split: train
48
+ args:
49
+ num_few_shot: 3
50
+ metrics:
51
+ - type: acc
52
+ value: 35.31
53
+ name: accuracy
54
+ source:
55
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicolasdec/cabra13b
56
+ name: Open Portuguese LLM Leaderboard
57
+ - task:
58
+ type: text-generation
59
+ name: Text Generation
60
+ dataset:
61
+ name: Assin2 RTE
62
+ type: assin2
63
+ split: test
64
+ args:
65
+ num_few_shot: 15
66
+ metrics:
67
+ - type: f1_macro
68
+ value: 85.55
69
+ name: f1-macro
70
+ - type: pearson
71
+ value: 57.19
72
+ name: pearson
73
+ source:
74
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicolasdec/cabra13b
75
+ name: Open Portuguese LLM Leaderboard
76
+ - task:
77
+ type: text-generation
78
+ name: Text Generation
79
+ dataset:
80
+ name: FaQuAD NLI
81
+ type: ruanchaves/faquad-nli
82
+ split: test
83
+ args:
84
+ num_few_shot: 15
85
+ metrics:
86
+ - type: f1_macro
87
+ value: 45.45
88
+ name: f1-macro
89
+ source:
90
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicolasdec/cabra13b
91
+ name: Open Portuguese LLM Leaderboard
92
+ - task:
93
+ type: text-generation
94
+ name: Text Generation
95
+ dataset:
96
+ name: HateBR Binary
97
+ type: eduagarcia/portuguese_benchmark
98
+ split: test
99
+ args:
100
+ num_few_shot: 25
101
+ metrics:
102
+ - type: f1_macro
103
+ value: 77.78
104
+ name: f1-macro
105
+ - type: f1_macro
106
+ value: 64.13
107
+ name: f1-macro
108
+ source:
109
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicolasdec/cabra13b
110
+ name: Open Portuguese LLM Leaderboard
111
+ - task:
112
+ type: text-generation
113
+ name: Text Generation
114
+ dataset:
115
+ name: tweetSentBR
116
+ type: eduagarcia-temp/tweetsentbr
117
+ split: test
118
+ args:
119
+ num_few_shot: 25
120
+ metrics:
121
+ - type: f1_macro
122
+ value: 53.37
123
+ name: f1-macro
124
+ source:
125
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicolasdec/cabra13b
126
+ name: Open Portuguese LLM Leaderboard
127
  ---
128
 
129
  **Conheça os nossos outros modelos (bem melhores): [Cabra](https://huggingface.co/collections/botbot-ai/models-6604c2069ceef04f834ba99b)**
 
134
 
135
  O modelo precisa de mais treinamento, e pode gerar mentira ou inverdades.
136
 
137
+
138
+ # [Open Portuguese LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
139
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/nicolasdec/cabra13b)
140
+
141
+ | Metric | Value |
142
+ |--------------------------|--------|
143
+ |Average |**56.5**|
144
+ |ENEM Challenge (No Images)| 48.85|
145
+ |BLUEX (No Images) | 40.89|
146
+ |OAB Exams | 35.31|
147
+ |Assin2 RTE | 85.55|
148
+ |Assin2 STS | 57.19|
149
+ |FaQuAD NLI | 45.45|
150
+ |HateBR Binary | 77.78|
151
+ |PT Hate Speech Binary | 64.13|
152
+ |tweetSentBR | 53.37|
153
+