leaderboard-pt-pr-bot commited on
Commit
27d1e2a
1 Parent(s): a3f0b78

Adding the Open Portuguese LLM Leaderboard Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard

The purpose of this PR is to add evaluation results from the Open Portuguese LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard/discussions

Files changed (1) hide show
  1. README.md +148 -10
README.md CHANGED
@@ -1,19 +1,140 @@
1
  ---
2
  language:
3
- - pt
4
- - en
5
  license: cc
6
  tags:
7
- - text-generation-inference
8
- - transformers
9
- - qwen
10
- - gguf
11
- - brazil
12
- - brasil
13
- - 14b
14
- - portuguese
15
  base_model: Qwen/Qwen1.5-14B-Chat
16
  pipeline_tag: text-generation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
  ---
18
  # Cabra Qwen 14b
19
  <img src="https://uploads-ssl.webflow.com/65f77c0240ae1c68f8192771/660b1a4de37e3389b7220262_cabra3.png" width="400" height="400">
@@ -161,3 +282,20 @@ O modelo é destinado, por agora, a fins de pesquisa. As áreas e tarefas de pes
161
  | | | exam_id__2014-15 | 3 | acc | 0.5897 | ± 0.0323 |
162
  | portuguese_hate_speech_binary | 1.0 | all | 25 | f1_macro | 0.7180 | ± 0.0115 |
163
  | | | all | 25 | acc | 0.7462 | ± 0.0106 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  language:
3
+ - pt
4
+ - en
5
  license: cc
6
  tags:
7
+ - text-generation-inference
8
+ - transformers
9
+ - qwen
10
+ - gguf
11
+ - brazil
12
+ - brasil
13
+ - 14b
14
+ - portuguese
15
  base_model: Qwen/Qwen1.5-14B-Chat
16
  pipeline_tag: text-generation
17
+ model-index:
18
+ - name: CabraQwen14b
19
+ results:
20
+ - task:
21
+ type: text-generation
22
+ name: Text Generation
23
+ dataset:
24
+ name: ENEM Challenge (No Images)
25
+ type: eduagarcia/enem_challenge
26
+ split: train
27
+ args:
28
+ num_few_shot: 3
29
+ metrics:
30
+ - type: acc
31
+ value: 75.16
32
+ name: accuracy
33
+ source:
34
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicolasdec/CabraQwen14b
35
+ name: Open Portuguese LLM Leaderboard
36
+ - task:
37
+ type: text-generation
38
+ name: Text Generation
39
+ dataset:
40
+ name: BLUEX (No Images)
41
+ type: eduagarcia-temp/BLUEX_without_images
42
+ split: train
43
+ args:
44
+ num_few_shot: 3
45
+ metrics:
46
+ - type: acc
47
+ value: 60.78
48
+ name: accuracy
49
+ source:
50
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicolasdec/CabraQwen14b
51
+ name: Open Portuguese LLM Leaderboard
52
+ - task:
53
+ type: text-generation
54
+ name: Text Generation
55
+ dataset:
56
+ name: OAB Exams
57
+ type: eduagarcia/oab_exams
58
+ split: train
59
+ args:
60
+ num_few_shot: 3
61
+ metrics:
62
+ - type: acc
63
+ value: 49.89
64
+ name: accuracy
65
+ source:
66
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicolasdec/CabraQwen14b
67
+ name: Open Portuguese LLM Leaderboard
68
+ - task:
69
+ type: text-generation
70
+ name: Text Generation
71
+ dataset:
72
+ name: Assin2 RTE
73
+ type: assin2
74
+ split: test
75
+ args:
76
+ num_few_shot: 15
77
+ metrics:
78
+ - type: f1_macro
79
+ value: 91.42
80
+ name: f1-macro
81
+ - type: pearson
82
+ value: 80.85
83
+ name: pearson
84
+ source:
85
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicolasdec/CabraQwen14b
86
+ name: Open Portuguese LLM Leaderboard
87
+ - task:
88
+ type: text-generation
89
+ name: Text Generation
90
+ dataset:
91
+ name: FaQuAD NLI
92
+ type: ruanchaves/faquad-nli
93
+ split: test
94
+ args:
95
+ num_few_shot: 15
96
+ metrics:
97
+ - type: f1_macro
98
+ value: 46.05
99
+ name: f1-macro
100
+ source:
101
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicolasdec/CabraQwen14b
102
+ name: Open Portuguese LLM Leaderboard
103
+ - task:
104
+ type: text-generation
105
+ name: Text Generation
106
+ dataset:
107
+ name: HateBR Binary
108
+ type: eduagarcia/portuguese_benchmark
109
+ split: test
110
+ args:
111
+ num_few_shot: 25
112
+ metrics:
113
+ - type: f1_macro
114
+ value: 79.32
115
+ name: f1-macro
116
+ - type: f1_macro
117
+ value: 71.8
118
+ name: f1-macro
119
+ source:
120
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicolasdec/CabraQwen14b
121
+ name: Open Portuguese LLM Leaderboard
122
+ - task:
123
+ type: text-generation
124
+ name: Text Generation
125
+ dataset:
126
+ name: tweetSentBR
127
+ type: eduagarcia-temp/tweetsentbr
128
+ split: test
129
+ args:
130
+ num_few_shot: 25
131
+ metrics:
132
+ - type: f1_macro
133
+ value: 62.65
134
+ name: f1-macro
135
+ source:
136
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicolasdec/CabraQwen14b
137
+ name: Open Portuguese LLM Leaderboard
138
  ---
139
  # Cabra Qwen 14b
140
  <img src="https://uploads-ssl.webflow.com/65f77c0240ae1c68f8192771/660b1a4de37e3389b7220262_cabra3.png" width="400" height="400">
 
282
  | | | exam_id__2014-15 | 3 | acc | 0.5897 | ± 0.0323 |
283
  | portuguese_hate_speech_binary | 1.0 | all | 25 | f1_macro | 0.7180 | ± 0.0115 |
284
  | | | all | 25 | acc | 0.7462 | ± 0.0106 |
285
+
286
+ # [Open Portuguese LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
287
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/nicolasdec/CabraQwen14b)
288
+
289
+ | Metric | Value |
290
+ |--------------------------|---------|
291
+ |Average |**68.66**|
292
+ |ENEM Challenge (No Images)| 75.16|
293
+ |BLUEX (No Images) | 60.78|
294
+ |OAB Exams | 49.89|
295
+ |Assin2 RTE | 91.42|
296
+ |Assin2 STS | 80.85|
297
+ |FaQuAD NLI | 46.05|
298
+ |HateBR Binary | 79.32|
299
+ |PT Hate Speech Binary | 71.80|
300
+ |tweetSentBR | 62.65|
301
+