metadata

language:
  - pt
  - en
license: cc
tags:
  - text-generation-inference
  - transformers
  - mistral
  - gguf
  - brazil
  - brasil
  - portuguese
base_model: mistralai/Mistral-7B-Instruct-v0.2
metrics:
  - name: assin2_rte f1_macro
    type: assin2_rte
    value: 90.13
  - name: assin2_rte acc
    type: assin2_rte
    value: 90.16
  - name: assin2_sts pearson
    type: assin2_sts
    value: 71.51
  - name: assin2_sts mse
    type: assin2_sts
    value: 68.03
  - name: bluex acc
    type: bluex
    value: 47.98
  - name: enem acc
    type: enem
    value: 58.43
  - name: faquad_nli f1_macro
    type: faquad_nli
    value: 64.24
  - name: faquad_nli acc
    type: faquad_nli
    value: 67.69
  - name: hatebr_offensive_binary f1_macro
    type: hatebr_offensive_binary
    value: 83.61
  - name: hatebr_offensive_binary acc
    type: hatebr_offensive_binary
    value: 83.71
  - name: oab_exams acc
    type: oab_exams
    value: 38.41
  - name: portuguese_hate_speech_binary f1_macro
    type: portuguese_hate_speech_binary
    value: 61.87
  - name: portuguese_hate_speech_binary acc
    type: portuguese_hate_speech_binary
    value: 63.22
pipeline_tag: text-generation
model-index:
  - name: CabraMistral7b
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: ENEM Challenge (No Images)
          type: eduagarcia/enem_challenge
          split: train
          args:
            num_few_shot: 3
        metrics:
          - type: acc
            value: 60.81
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicolasdec/CabraMistral7b
          name: Open Portuguese LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: BLUEX (No Images)
          type: eduagarcia-temp/BLUEX_without_images
          split: train
          args:
            num_few_shot: 3
        metrics:
          - type: acc
            value: 46.87
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicolasdec/CabraMistral7b
          name: Open Portuguese LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: OAB Exams
          type: eduagarcia/oab_exams
          split: train
          args:
            num_few_shot: 3
        metrics:
          - type: acc
            value: 38.59
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicolasdec/CabraMistral7b
          name: Open Portuguese LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Assin2 RTE
          type: assin2
          split: test
          args:
            num_few_shot: 15
        metrics:
          - type: f1_macro
            value: 90.27
            name: f1-macro
          - type: pearson
            value: 72.25
            name: pearson
        source:
          url: >-
            https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicolasdec/CabraMistral7b
          name: Open Portuguese LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: FaQuAD NLI
          type: ruanchaves/faquad-nli
          split: test
          args:
            num_few_shot: 15
        metrics:
          - type: f1_macro
            value: 64.35
            name: f1-macro
        source:
          url: >-
            https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicolasdec/CabraMistral7b
          name: Open Portuguese LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HateBR Binary
          type: eduagarcia/portuguese_benchmark
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: f1_macro
            value: 83.15
            name: f1-macro
          - type: f1_macro
            value: 64.82
            name: f1-macro
        source:
          url: >-
            https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicolasdec/CabraMistral7b
          name: Open Portuguese LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: tweetSentBR
          type: eduagarcia-temp/tweetsentbr
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: f1_macro
            value: 64.8
            name: f1-macro
        source:
          url: >-
            https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=nicolasdec/CabraMistral7b
          name: Open Portuguese LLM Leaderboard

Cabra Mistral 7b v2

Esse modelo é um finetune do Mistral 7b Instruct 0.2 com o dataset interno Cabra 10k. Esse modelo é optimizado para português. Ele apresenta melhoria em varios benchmarks brasileiros em comparação com o modelo base.

Exprimente o nosso demo aqui: CabraChat.

Conheça os nossos outros modelos: Cabra.

Detalhes do Modelo

Modelo: Mistral 7b Instruct 0.2

Mistral-7B-v0.1 é um modelo de transformador, com as seguintes escolhas arquitetônicas:

Grouped-Query Attention
Sliding-Window Attention
Byte-fallback BPE tokenizer

dataset: Cabra 10k

Dataset interno para finetuning. Vamos lançar em breve.

Quantização / GGUF

Colocamos diversas versões (GGUF) quantanizadas no branch "quantanization".

Exemplo

<s> [INST] who is Elon Musk? [/INST]Elon Musk é um empreendedor, inventor e capitalista americano. Ele é o fundador, CEO e CTO da SpaceX, CEO da Neuralink e fundador do The Boring Company. Musk também é o proprietário do Twitter.</s>

Paramentros de trainamento

- learning_rate: 1e-05
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 8
- total_train_batch_size: 64
- total_eval_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.01
- num_epochs: 3

Framework

Transformers 4.39.0.dev0
Pytorch 2.1.2+cu118
Datasets 2.14.6
Tokenizers 0.15.2

Uso

O modelo é destinado, por agora, a fins de pesquisa. As áreas e tarefas de pesquisa possíveis incluem:

Pesquisa sobre modelos gerativos.
Investigação e compreensão das limitações e viéses de modelos gerativos.

Proibido para uso comercial. Somente Pesquisa.

Evals

Tasks	Version	Filter	n-shot	Metric	Value	Stderr
assin2_rte	1.1	all	15	f1_macro	0.9013	± 0.0043
		all	15	acc	0.9016	± 0.0043
assin2_sts	1.1	all	15	pearson	0.7151	± 0.0074
		all	15	mse	0.6803	± N/A
bluex	1.1	all	3	acc	0.4798	± 0.0107
		exam_id__USP_2019	3	acc	0.375	± 0.044
		exam_id__USP_2021	3	acc	0.3462	± 0.0382
		exam_id__USP_2020	3	acc	0.4107	± 0.0379
		exam_id__UNICAMP_2018	3	acc	0.4815	± 0.0392
		exam_id__UNICAMP_2020	3	acc	0.4727	± 0.0389
		exam_id__UNICAMP_2021_1	3	acc	0.413	± 0.0418
		exam_id__UNICAMP_2019	3	acc	0.42	± 0.0404
		exam_id__UNICAMP_2022	3	acc	0.5897	± 0.0456
		exam_id__USP_2022	3	acc	0.449	± 0.041
		exam_id__USP_2024	3	acc	0.6341	± 0.0434
		exam_id__UNICAMP_2024	3	acc	0.6	± 0.0422
		exam_id__USP_2023	3	acc	0.5455	± 0.0433
		exam_id__UNICAMP_2023	3	acc	0.5349	± 0.044
		exam_id__USP_2018	3	acc	0.4815	± 0.0393
		exam_id__UNICAMP_2021_2	3	acc	0.5098	± 0.0403
enem	1.1	all	3	acc	0.5843	± 0.0075
		exam_id__2010	3	acc	0.5726	± 0.0264
		exam_id__2009	3	acc	0.6	± 0.0264
		exam_id__2014	3	acc	0.633	± 0.0268
		exam_id__2022	3	acc	0.6165	± 0.0243
		exam_id__2012	3	acc	0.569	± 0.0265
		exam_id__2013	3	acc	0.5833	± 0.0274
		exam_id__2016_2	3	acc	0.5203	± 0.026
		exam_id__2011	3	acc	0.6325	± 0.0257
		exam_id__2023	3	acc	0.5778	± 0.0246
		exam_id__2016	3	acc	0.595	± 0.0258
		exam_id__2017	3	acc	0.5517	± 0.0267
		exam_id__2015	3	acc	0.563	± 0.0261
faquad_nli	1.1	all	15	f1_macro	0.6424	± 0.0138
		all	15	acc	0.6769	± 0.013
hatebr_offensive_binary	1	all	25	f1_macro	0.8361	± 0.007
		all	25	acc	0.8371	± 0.007
oab_exams	1.5	all	3	acc	0.3841	± 0.006
		exam_id__2011-03	3	acc	0.3636	± 0.0279
		exam_id__2014-14	3	acc	0.475	± 0.0323
		exam_id__2016-21	3	acc	0.4125	± 0.0318
		exam_id__2012-06a	3	acc	0.3875	± 0.0313
		exam_id__2014-13	3	acc	0.325	± 0.0303
		exam_id__2015-16	3	acc	0.425	± 0.032
		exam_id__2010-02	3	acc	0.4	± 0.0283
		exam_id__2012-08	3	acc	0.3875	± 0.0314
		exam_id__2011-05	3	acc	0.375	± 0.0312
		exam_id__2017-22	3	acc	0.4	± 0.0316
		exam_id__2018-25	3	acc	0.4125	± 0.0318
		exam_id__2012-09	3	acc	0.3636	± 0.0317
		exam_id__2017-24	3	acc	0.3375	± 0.0304
		exam_id__2016-20a	3	acc	0.3125	± 0.0299
		exam_id__2012-06	3	acc	0.425	± 0.0318
		exam_id__2013-12	3	acc	0.4375	± 0.0321
		exam_id__2016-20	3	acc	0.45	± 0.0322
		exam_id__2013-11	3	acc	0.4	± 0.0316
		exam_id__2015-17	3	acc	0.4231	± 0.0323
		exam_id__2015-18	3	acc	0.4	± 0.0316
		exam_id__2017-23	3	acc	0.35	± 0.0308
		exam_id__2010-01	3	acc	0.2471	± 0.0271
		exam_id__2011-04	3	acc	0.375	± 0.0313
		exam_id__2016-19	3	acc	0.4103	± 0.0321
		exam_id__2013-10	3	acc	0.3375	± 0.0305
		exam_id__2012-07	3	acc	0.3625	± 0.031
		exam_id__2014-15	3	acc	0.3846	± 0.0318
portuguese_hate_speech_binary	1	all	25	f1_macro	0.6187	± 0.0119
		all	25	acc	0.6322	± 0.0117

Open Portuguese LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Average	65.1
ENEM Challenge (No Images)	60.81
BLUEX (No Images)	46.87
OAB Exams	38.59
Assin2 RTE	90.27
Assin2 STS	72.25
FaQuAD NLI	64.35
HateBR Binary	83.15
PT Hate Speech Binary	64.82
tweetSentBR	64.80