Commit
4b661dd
1 Parent(s): 973adc2

Adding the Open Portuguese LLM Leaderboard Evaluation Results (#1)

Browse files

- Adding the Open Portuguese LLM Leaderboard Evaluation Results (9caf5756b1d8b0ad117a7d25a51d870eef6ddbaa)


Co-authored-by: Open PT LLM Leaderboard PR Bot <[email protected]>

Files changed (1) hide show
  1. README.md +146 -9
README.md CHANGED
@@ -1,14 +1,9 @@
1
  ---
2
- license: mit
3
  language:
4
  - pt
5
  - en
6
- metrics:
7
- - accuracy
8
- - f1
9
- - precision
10
- - recall
11
- pipeline_tag: text-generation
12
  tags:
13
  - LLM
14
  - Portuguese
@@ -16,8 +11,134 @@ tags:
16
  - Alpaca
17
  - Llama 2
18
  - Q&A
19
- library_name: peft
 
 
 
 
 
20
  inference: false
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  ---
22
 
23
  # BODE
@@ -147,4 +268,20 @@ Contribuições para a melhoria deste modelo são bem-vindas. Sinta-se à vontad
147
 
148
  ## Agradecimentos
149
 
150
- Agradecemos ao Laboratório Nacional de Computação Científica (LNCC/MCTI, Brasil) por prover os recursos de CAD do supercomputador SDumont.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  language:
3
  - pt
4
  - en
5
+ license: mit
6
+ library_name: peft
 
 
 
 
7
  tags:
8
  - LLM
9
  - Portuguese
 
11
  - Alpaca
12
  - Llama 2
13
  - Q&A
14
+ metrics:
15
+ - accuracy
16
+ - f1
17
+ - precision
18
+ - recall
19
+ pipeline_tag: text-generation
20
  inference: false
21
+ model-index:
22
+ - name: bode-13b-alpaca-pt-br
23
+ results:
24
+ - task:
25
+ type: text-generation
26
+ name: Text Generation
27
+ dataset:
28
+ name: ENEM Challenge (No Images)
29
+ type: eduagarcia/enem_challenge
30
+ split: train
31
+ args:
32
+ num_few_shot: 3
33
+ metrics:
34
+ - type: acc
35
+ value: 33.66
36
+ name: accuracy
37
+ source:
38
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/bode-13b-alpaca-pt-br
39
+ name: Open Portuguese LLM Leaderboard
40
+ - task:
41
+ type: text-generation
42
+ name: Text Generation
43
+ dataset:
44
+ name: BLUEX (No Images)
45
+ type: eduagarcia-temp/BLUEX_without_images
46
+ split: train
47
+ args:
48
+ num_few_shot: 3
49
+ metrics:
50
+ - type: acc
51
+ value: 38.25
52
+ name: accuracy
53
+ source:
54
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/bode-13b-alpaca-pt-br
55
+ name: Open Portuguese LLM Leaderboard
56
+ - task:
57
+ type: text-generation
58
+ name: Text Generation
59
+ dataset:
60
+ name: OAB Exams
61
+ type: eduagarcia/oab_exams
62
+ split: train
63
+ args:
64
+ num_few_shot: 3
65
+ metrics:
66
+ - type: acc
67
+ value: 36.04
68
+ name: accuracy
69
+ source:
70
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/bode-13b-alpaca-pt-br
71
+ name: Open Portuguese LLM Leaderboard
72
+ - task:
73
+ type: text-generation
74
+ name: Text Generation
75
+ dataset:
76
+ name: Assin2 RTE
77
+ type: assin2
78
+ split: test
79
+ args:
80
+ num_few_shot: 15
81
+ metrics:
82
+ - type: f1_macro
83
+ value: 71.22
84
+ name: f1-macro
85
+ - type: pearson
86
+ value: 46.75
87
+ name: pearson
88
+ source:
89
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/bode-13b-alpaca-pt-br
90
+ name: Open Portuguese LLM Leaderboard
91
+ - task:
92
+ type: text-generation
93
+ name: Text Generation
94
+ dataset:
95
+ name: FaQuAD NLI
96
+ type: ruanchaves/faquad-nli
97
+ split: test
98
+ args:
99
+ num_few_shot: 15
100
+ metrics:
101
+ - type: f1_macro
102
+ value: 51.68
103
+ name: f1-macro
104
+ source:
105
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/bode-13b-alpaca-pt-br
106
+ name: Open Portuguese LLM Leaderboard
107
+ - task:
108
+ type: text-generation
109
+ name: Text Generation
110
+ dataset:
111
+ name: HateBR Binary
112
+ type: eduagarcia/portuguese_benchmark
113
+ split: test
114
+ args:
115
+ num_few_shot: 25
116
+ metrics:
117
+ - type: f1_macro
118
+ value: 82.21
119
+ name: f1-macro
120
+ - type: f1_macro
121
+ value: 65.54
122
+ name: f1-macro
123
+ source:
124
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/bode-13b-alpaca-pt-br
125
+ name: Open Portuguese LLM Leaderboard
126
+ - task:
127
+ type: text-generation
128
+ name: Text Generation
129
+ dataset:
130
+ name: tweetSentBR
131
+ type: eduagarcia-temp/tweetsentbr
132
+ split: test
133
+ args:
134
+ num_few_shot: 25
135
+ metrics:
136
+ - type: f1_macro
137
+ value: 47.55
138
+ name: f1-macro
139
+ source:
140
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/bode-13b-alpaca-pt-br
141
+ name: Open Portuguese LLM Leaderboard
142
  ---
143
 
144
  # BODE
 
268
 
269
  ## Agradecimentos
270
 
271
+ Agradecemos ao Laboratório Nacional de Computação Científica (LNCC/MCTI, Brasil) por prover os recursos de CAD do supercomputador SDumont.
272
+ # [Open Portuguese LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
273
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/recogna-nlp/bode-13b-alpaca-pt-br)
274
+
275
+ | Metric | Value |
276
+ |--------------------------|---------|
277
+ |Average |**52.54**|
278
+ |ENEM Challenge (No Images)| 33.66|
279
+ |BLUEX (No Images) | 38.25|
280
+ |OAB Exams | 36.04|
281
+ |Assin2 RTE | 71.22|
282
+ |Assin2 STS | 46.75|
283
+ |FaQuAD NLI | 51.68|
284
+ |HateBR Binary | 82.21|
285
+ |PT Hate Speech Binary | 65.54|
286
+ |tweetSentBR | 47.55|
287
+