Commit
5482d5f
1 Parent(s): c4d7e8d

Adding the Open Portuguese LLM Leaderboard Evaluation Results (#2)

Browse files

- Adding the Open Portuguese LLM Leaderboard Evaluation Results (69ed565e10c1efd34cde64fe26fb80493a826bab)


Co-authored-by: Open PT LLM Leaderboard PR Bot <[email protected]>

Files changed (1) hide show
  1. README.md +141 -4
README.md CHANGED
@@ -1,18 +1,139 @@
1
  ---
 
 
 
2
  library_name: peft
3
  tags:
4
  - Gemma
5
  - Portuguese
6
  - Bode
7
  - Alpaca
8
- license: mit
9
- language:
10
- - pt
11
  metrics:
12
  - accuracy
13
  - precision
14
  - f1
15
  - recall
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  ---
17
 
18
  # GemBode-2b-it
@@ -110,4 +231,20 @@ Se você deseja utilizar o GemBode-2b-it em sua pesquisa, cite-o da seguinte man
110
  doi = { 10.57967/hf/1879 },
111
  publisher = { Hugging Face }
112
  }
113
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - pt
4
+ license: mit
5
  library_name: peft
6
  tags:
7
  - Gemma
8
  - Portuguese
9
  - Bode
10
  - Alpaca
 
 
 
11
  metrics:
12
  - accuracy
13
  - precision
14
  - f1
15
  - recall
16
+ model-index:
17
+ - name: GemBode-2b-it
18
+ results:
19
+ - task:
20
+ type: text-generation
21
+ name: Text Generation
22
+ dataset:
23
+ name: ENEM Challenge (No Images)
24
+ type: eduagarcia/enem_challenge
25
+ split: train
26
+ args:
27
+ num_few_shot: 3
28
+ metrics:
29
+ - type: acc
30
+ value: 21.62
31
+ name: accuracy
32
+ source:
33
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/GemBode-2b-it
34
+ name: Open Portuguese LLM Leaderboard
35
+ - task:
36
+ type: text-generation
37
+ name: Text Generation
38
+ dataset:
39
+ name: BLUEX (No Images)
40
+ type: eduagarcia-temp/BLUEX_without_images
41
+ split: train
42
+ args:
43
+ num_few_shot: 3
44
+ metrics:
45
+ - type: acc
46
+ value: 25.45
47
+ name: accuracy
48
+ source:
49
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/GemBode-2b-it
50
+ name: Open Portuguese LLM Leaderboard
51
+ - task:
52
+ type: text-generation
53
+ name: Text Generation
54
+ dataset:
55
+ name: OAB Exams
56
+ type: eduagarcia/oab_exams
57
+ split: train
58
+ args:
59
+ num_few_shot: 3
60
+ metrics:
61
+ - type: acc
62
+ value: 27.33
63
+ name: accuracy
64
+ source:
65
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/GemBode-2b-it
66
+ name: Open Portuguese LLM Leaderboard
67
+ - task:
68
+ type: text-generation
69
+ name: Text Generation
70
+ dataset:
71
+ name: Assin2 RTE
72
+ type: assin2
73
+ split: test
74
+ args:
75
+ num_few_shot: 15
76
+ metrics:
77
+ - type: f1_macro
78
+ value: 53.1
79
+ name: f1-macro
80
+ - type: pearson
81
+ value: 15.57
82
+ name: pearson
83
+ source:
84
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/GemBode-2b-it
85
+ name: Open Portuguese LLM Leaderboard
86
+ - task:
87
+ type: text-generation
88
+ name: Text Generation
89
+ dataset:
90
+ name: FaQuAD NLI
91
+ type: ruanchaves/faquad-nli
92
+ split: test
93
+ args:
94
+ num_few_shot: 15
95
+ metrics:
96
+ - type: f1_macro
97
+ value: 53.05
98
+ name: f1-macro
99
+ source:
100
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/GemBode-2b-it
101
+ name: Open Portuguese LLM Leaderboard
102
+ - task:
103
+ type: text-generation
104
+ name: Text Generation
105
+ dataset:
106
+ name: HateBR Binary
107
+ type: eduagarcia/portuguese_benchmark
108
+ split: test
109
+ args:
110
+ num_few_shot: 25
111
+ metrics:
112
+ - type: f1_macro
113
+ value: 66.89
114
+ name: f1-macro
115
+ - type: f1_macro
116
+ value: 24.22
117
+ name: f1-macro
118
+ source:
119
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/GemBode-2b-it
120
+ name: Open Portuguese LLM Leaderboard
121
+ - task:
122
+ type: text-generation
123
+ name: Text Generation
124
+ dataset:
125
+ name: tweetSentBR
126
+ type: eduagarcia-temp/tweetsentbr
127
+ split: test
128
+ args:
129
+ num_few_shot: 25
130
+ metrics:
131
+ - type: f1_macro
132
+ value: 37.47
133
+ name: f1-macro
134
+ source:
135
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/GemBode-2b-it
136
+ name: Open Portuguese LLM Leaderboard
137
  ---
138
 
139
  # GemBode-2b-it
 
231
  doi = { 10.57967/hf/1879 },
232
  publisher = { Hugging Face }
233
  }
234
+ ```
235
+ # [Open Portuguese LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
236
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/recogna-nlp/GemBode-2b-it)
237
+
238
+ | Metric | Value |
239
+ |--------------------------|---------|
240
+ |Average |**36.08**|
241
+ |ENEM Challenge (No Images)| 21.62|
242
+ |BLUEX (No Images) | 25.45|
243
+ |OAB Exams | 27.33|
244
+ |Assin2 RTE | 53.10|
245
+ |Assin2 STS | 15.57|
246
+ |FaQuAD NLI | 53.05|
247
+ |HateBR Binary | 66.89|
248
+ |PT Hate Speech Binary | 24.22|
249
+ |tweetSentBR | 37.47|
250
+