nicolasdec
commited on
Commit
•
d48f2bf
1
Parent(s):
f1b31c5
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,152 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- pt
|
4 |
+
- en
|
5 |
+
license: cc
|
6 |
+
tags:
|
7 |
+
- text-generation-inference
|
8 |
+
- transformers
|
9 |
+
- unsloth
|
10 |
+
- mistral
|
11 |
+
- gguf
|
12 |
+
- brazil
|
13 |
+
- brasil
|
14 |
+
- portuguese
|
15 |
+
base_model: mistralai/Mistral-7B-Instruct-v0.2
|
16 |
+
pipeline_tag: text-generation
|
17 |
+
---
|
18 |
+
# Cabra Mistral 7b v2
|
19 |
+
<img src="https://media.discordapp.net/attachments/1060891441724932096/1219303427000242316/blackpantera_cute_goat_with_red_M_in_the_background_brazil_flag_3b448f3a-d500-4f01-877f-2e469aba7dfc.png?ex=660acfce&is=65f85ace&hm=28ee401f092b558b11df54951270189641fe7d1173bfc4a5d633e53fb03c2d6d&=&format=webp&quality=lossless&width=350&height=350" width="400" height="400">
|
20 |
+
|
21 |
+
Esse modelo é um finetune do [Mistral 7b Instruct 0.2](https://huggingface.co/mistralai/mistral-7b-instruct-v0.2) com o dataset interno Cabra 5k. Esse modelo é optimizado para português e responde em portuguese.
|
22 |
+
|
23 |
+
**Exprimente o nosso demo aqui: [CabraChat](https://huggingface.co/spaces/nicolasdec/CabraChat).**
|
24 |
+
|
25 |
+
**Conheça os outros modelos finetuned para português: [Cabra](https://huggingface.co/collections/nicolasdec/cabra-65d12286c4d2b2e4029c0c63).**
|
26 |
+
|
27 |
+
## Detalhes do Modelo
|
28 |
+
|
29 |
+
### Modelo: Mistral 7b Instruct 0.2
|
30 |
+
|
31 |
+
Mistral-7B-v0.1 é um modelo de transformador, com as seguintes escolhas arquitetônicas:
|
32 |
+
|
33 |
+
- Grouped-Query Attention
|
34 |
+
- Sliding-Window Attention
|
35 |
+
- Byte-fallback BPE tokenizer
|
36 |
+
|
37 |
+
### dataset: Cabra 5k
|
38 |
+
|
39 |
+
Dataset Interno para finetuing. Vamos lançar em breve.
|
40 |
+
|
41 |
+
### Exemplo
|
42 |
+
|
43 |
+
```
|
44 |
+
<s> [INST] who is Elon Musk? [/INST]Elon Musk é um empreendedor, inventor e capitalista americano. Ele é o fundador, CEO e CTO da SpaceX, CEO da Neuralink e fundador do The Boring Company. Musk também é o proprietário do Twitter.</s>
|
45 |
+
|
46 |
+
```
|
47 |
+
|
48 |
+
### Paramentros de trainamento
|
49 |
+
|
50 |
+
```
|
51 |
+
- learning_rate: 1e-05
|
52 |
+
- train_batch_size: 4
|
53 |
+
- eval_batch_size: 4
|
54 |
+
- seed: 42
|
55 |
+
- distributed_type: multi-GPU
|
56 |
+
- num_devices: 2
|
57 |
+
- gradient_accumulation_steps: 8
|
58 |
+
- total_train_batch_size: 64
|
59 |
+
- total_eval_batch_size: 8
|
60 |
+
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
61 |
+
- lr_scheduler_type: cosine
|
62 |
+
- lr_scheduler_warmup_ratio: 0.01
|
63 |
+
- num_epochs: 3
|
64 |
+
```
|
65 |
+
|
66 |
+
### Framework
|
67 |
+
|
68 |
+
- Transformers 4.39.0.dev0
|
69 |
+
- Pytorch 2.1.2+cu118
|
70 |
+
- Datasets 2.14.6
|
71 |
+
- Tokenizers 0.15.2
|
72 |
+
|
73 |
+
## Uso
|
74 |
+
O modelo é destinado, por agora, a fins de pesquisa. As áreas e tarefas de pesquisa possíveis incluem:
|
75 |
+
|
76 |
+
- Pesquisa sobre modelos gerativos.
|
77 |
+
- Investigação e compreensão das limitações e viéses de modelos gerativos.
|
78 |
+
|
79 |
+
**Proibido para uso comercial. Somente Pesquisa.**
|
80 |
+
|
81 |
+
### Evals
|
82 |
+
|
83 |
+
| Tasks | Version | Filter | n-shot | Metric | Value | Stderr |
|
84 |
+
|-----------------------------|---------|----------------------|--------|----------|--------|---------|
|
85 |
+
| assin2_rte | 1.1 | all | 15 | f1_macro | 0.9013 | ± 0.0043 |
|
86 |
+
| | | all | 15 | acc | 0.9016 | ± 0.0043 |
|
87 |
+
| assin2_sts | 1.1 | all | 15 | pearson | 0.7151 | ± 0.0074 |
|
88 |
+
| | | all | 15 | mse | 0.6803 | ± N/A |
|
89 |
+
| bluex | 1.1 | all | 3 | acc | 0.4798 | ± 0.0107 |
|
90 |
+
| | | exam_id__USP_2019 | 3 | acc | 0.375 | ± 0.044 |
|
91 |
+
| | | exam_id__USP_2021 | 3 | acc | 0.3462 | ± 0.0382 |
|
92 |
+
| | | exam_id__USP_2020 | 3 | acc | 0.4107 | ± 0.0379 |
|
93 |
+
| | | exam_id__UNICAMP_2018| 3 | acc | 0.4815 | ± 0.0392 |
|
94 |
+
| | | exam_id__UNICAMP_2020| 3 | acc | 0.4727 | ± 0.0389 |
|
95 |
+
| | | exam_id__UNICAMP_2021_1| 3 | acc | 0.413 | ± 0.0418 |
|
96 |
+
| | | exam_id__UNICAMP_2019| 3 | acc | 0.42 | ± 0.0404 |
|
97 |
+
| | | exam_id__UNICAMP_2022| 3 | acc | 0.5897 | ± 0.0456 |
|
98 |
+
| | | exam_id__USP_2022 | 3 | acc | 0.449 | ± 0.041 |
|
99 |
+
| | | exam_id__USP_2024 | 3 | acc | 0.6341 | ± 0.0434 |
|
100 |
+
| | | exam_id__UNICAMP_2024| 3 | acc | 0.6 | ± 0.0422 |
|
101 |
+
| | | exam_id__USP_2023 | 3 | acc | 0.5455 | ± 0.0433 |
|
102 |
+
| | | exam_id__UNICAMP_2023| 3 | acc | 0.5349 | ± 0.044 |
|
103 |
+
| | | exam_id__USP_2018 | 3 | acc | 0.4815 | ± 0.0393 |
|
104 |
+
| | | exam_id__UNICAMP_2021_2| 3 | acc | 0.5098 | ± 0.0403 |
|
105 |
+
| enem | 1.1 | all | 3 | acc | 0.5843 | ± 0.0075 |
|
106 |
+
| | | exam_id__2010 | 3 | acc | 0.5726 | ± 0.0264 |
|
107 |
+
| | | exam_id__2009 | 3 | acc | 0.6 | ± 0.0264 |
|
108 |
+
| | | exam_id__2014 | 3 | acc | 0.633 | ± 0.0268 |
|
109 |
+
| | | exam_id__2022 | 3 | acc | 0.6165 | ± 0.0243 |
|
110 |
+
| | | exam_id__2012 | 3 | acc | 0.569 | ± 0.0265 |
|
111 |
+
| | | exam_id__2013 | 3 | acc | 0.5833 | ± 0.0274 |
|
112 |
+
| | | exam_id__2016_2 | 3 | acc | 0.5203 | ± 0.026 |
|
113 |
+
| | | exam_id__2011 | 3 | acc | 0.6325 | ± 0.0257 |
|
114 |
+
| | | exam_id__2023 | 3 | acc | 0.5778 | ± 0.0246 |
|
115 |
+
| | | exam_id__2016 | 3 | acc | 0.595 | ± 0.0258 |
|
116 |
+
| | | exam_id__2017 | 3 | acc | 0.5517 | ± 0.0267 |
|
117 |
+
| | | exam_id__2015 | 3 | acc | 0.563 | ± 0.0261 |
|
118 |
+
| faquad_nli | 1.1 | all | 15 | f1_macro | 0.6424 | ± 0.0138 |
|
119 |
+
| | | all | 15 | acc | 0.6769 | ± 0.013 |
|
120 |
+
| hatebr_offensive_binary | 1 | all | 25 | f1_macro | 0.8361 | ± 0.007 |
|
121 |
+
| | | all | 25 | acc | 0.8371 | ± 0.007 |
|
122 |
+
| oab_exams | 1.5 | all | 3 | acc | 0.3841 | ± 0.006 |
|
123 |
+
| | | exam_id__2011-03 | 3 | acc | 0.3636 | ± 0.0279 |
|
124 |
+
| | | exam_id__2014-14 | 3 | acc | 0.475 | ± 0.0323 |
|
125 |
+
| | | exam_id__2016-21 | 3 | acc | 0.4125 | ± 0.0318 |
|
126 |
+
| | | exam_id__2012-06a | 3 | acc | 0.3875 | ± 0.0313 |
|
127 |
+
| | | exam_id__2014-13 | 3 | acc | 0.325 | ± 0.0303 |
|
128 |
+
| | | exam_id__2015-16 | 3 | acc | 0.425 | ± 0.032 |
|
129 |
+
| | | exam_id__2010-02 | 3 | acc | 0.4 | ± 0.0283 |
|
130 |
+
| | | exam_id__2012-08 | 3 | acc | 0.3875 | ± 0.0314 |
|
131 |
+
| | | exam_id__2011-05 | 3 | acc | 0.375 | ± 0.0312 |
|
132 |
+
| | | exam_id__2017-22 | 3 | acc | 0.4 | ± 0.0316 |
|
133 |
+
| | | exam_id__2018-25 | 3 | acc | 0.4125 | ± 0.0318 |
|
134 |
+
| | | exam_id__2012-09 | 3 | acc | 0.3636 | ± 0.0317 |
|
135 |
+
| | | exam_id__2017-24 | 3 | acc | 0.3375 | ± 0.0304 |
|
136 |
+
| | | exam_id__2016-20a | 3 | acc | 0.3125 | ± 0.0299 |
|
137 |
+
| | | exam_id__2012-06 | 3 | acc | 0.425 | ± 0.0318 |
|
138 |
+
| | | exam_id__2013-12 | 3 | acc | 0.4375 | ± 0.0321 |
|
139 |
+
| | | exam_id__2016-20 | 3 | acc | 0.45 | ± 0.0322 |
|
140 |
+
| | | exam_id__2013-11 | 3 | acc | 0.4 | ± 0.0316 |
|
141 |
+
| | | exam_id__2015-17 | 3 | acc | 0.4231 | ± 0.0323 |
|
142 |
+
| | | exam_id__2015-18 | 3 | acc | 0.4 | ± 0.0316 |
|
143 |
+
| | | exam_id__2017-23 | 3 | acc | 0.35 | ± 0.0308 |
|
144 |
+
| | | exam_id__2010-01 | 3 | acc | 0.2471 | ± 0.0271 |
|
145 |
+
| | | exam_id__2011-04 | 3 | acc | 0.375 | ± 0.0313 |
|
146 |
+
| | | exam_id__2016-19 | 3 | acc | 0.4103 | ± 0.0321 |
|
147 |
+
| | | exam_id__2013-10 | 3 | acc | 0.3375 | ± 0.0305 |
|
148 |
+
| | | exam_id__2012-07 | 3 | acc | 0.3625 | ± 0.031 |
|
149 |
+
| | | exam_id__2014-15 | 3 | acc | 0.3846 | ± 0.0318 |
|
150 |
+
| portuguese_hate_speech_binary | 1 | all | 25 | f1_macro | 0.6187 | ± 0.0119 |
|
151 |
+
| | | all | 25 | acc | 0.6322 | ± 0.0117 |
|
152 |
+
|