TeenyTinyLlama

nicholasKluge 's Collections

Aira

updated Jan 31, 2024

TeenyTinyLlama is a pair of compact language models based on the Llama 2 architecture trained on a Brazilian Portuguese corpus.

Upvote

TeenyTinyLlama: open-source tiny language models trained in Brazilian Portuguese

Paper • 2401.16640 • Published Jan 30, 2024 • 8
Running

4

4

TeenyTinyLlama-Chat

🦙
nicholasKluge/TeenyTinyLlama-460m

Text Generation • Updated Jan 15 • 286 • 11
Note 460 million-parameter version of the TeenyTinyLlama.
nicholasKluge/TeenyTinyLlama-460m-awq

Text Generation • Updated Jan 15 • 48 • 1

Note 460 million-parameter version (4-bit quantized via AWQ) of the TeenyTinyLlama.
nicholasKluge/TeenyTinyLlama-460m-Chat

Text Generation • Updated Jan 15 • 141 • 3

Note 460 million-parameter version of the TeenyTinyLlama fine-tuned on the Instruct-Aira Dataset version 2.0.
nicholasKluge/TeenyTinyLlama-460m-Chat-awq

Text Generation • Updated Jan 15 • 37 • 1

Note 460 million-parameter version of the TeenyTinyLlama fine-tuned on the Instruct-Aira Dataset version 2.0 (4-bit quantized via AWQ).
nicholasKluge/TeenyTinyLlama-460m-HateBR

Text Classification • Updated Oct 8, 2024 • 32 • 1

Note 460 million-parameter version of the TeenyTinyLlama fine-tuned on the HateBR dataset.
nicholasKluge/TeenyTinyLlama-460m-FaQuAD-NLI

Text Classification • Updated Oct 8, 2024 • 22

Note 460 million-parameter version of the TeenyTinyLlama fine-tuned on the FaQuAD-NLI dataset.
nicholasKluge/TeenyTinyLlama-460m-IMDB

Text Classification • Updated Oct 8, 2024 • 26 • 1

Note 460 million-parameter version of the TeenyTinyLlama fine-tuned on the IMDB dataset.
nicholasKluge/TeenyTinyLlama-460m-Assin2

Text Classification • Updated Oct 8, 2024 • 30

Note 460 million-parameter version of the TeenyTinyLlama fine-tuned on the Assin2 dataset.
nicholasKluge/TeenyTinyLlama-460m-AgNews

Text Classification • Updated Oct 8, 2024 • 25

Note 460 million-parameter version of the TeenyTinyLlama fine-tuned on the AgNews dataset.
nicholasKluge/TeenyTinyLlama-160m

Text Generation • Updated Jan 15 • 560 • 6
Note 160 million-parameter version of the TeenyTinyLlama.
nicholasKluge/TeenyTinyLlama-160m-HateBR

Text Classification • Updated Oct 8, 2024 • 63

Note 160 million-parameter version of the TeenyTinyLlama fine-tuned on the HateBR dataset.
nicholasKluge/TeenyTinyLlama-160m-FaQuAD-NLI

Text Classification • Updated Oct 8, 2024 • 44

Note 160 million-parameter version of the TeenyTinyLlama fine-tuned on the FaQuAD-NLI dataset.
nicholasKluge/TeenyTinyLlama-160m-IMDB

Text Classification • Updated Oct 8, 2024 • 77

Note 160 million-parameter version of the TeenyTinyLlama fine-tuned on the IMDB dataset.
nicholasKluge/TeenyTinyLlama-160m-Assin2

Text Classification • Updated Oct 8, 2024 • 23
nicholasKluge/TeenyTinyLlama-160m-AgNews

Text Classification • Updated Oct 8, 2024 • 30

Note 160 million-parameter version of the TeenyTinyLlama fine-tuned on the AgNews dataset.
nicholasKluge/Pt-Corpus

Viewer • Updated Jun 18, 2024 • 5.77M • 305 • 3

Note Pt-Corpus is a concatenation of several portions of Brazilian Portuguese datasets found in the Hub, with approximately 4.1B tokens. This version does not have instructional content.
nicholasKluge/Pt-Corpus-tokenized

Viewer • Updated Jun 18, 2024 • 2.02M • 515

Note Tokenized version of the Pt-Corpus (performed using the TeenyTinyLlama tokenizer).
nicholasKluge/Pt-Corpus-Instruct

Viewer • Updated Jun 18, 2024 • 10.6M • 145 • 3

Note Pt-Corpus Instruct is a concatenation of several portions of Brazilian Portuguese datasets found in the Hub, with approximately 6.2B tokens. This version of the corpus includes several instances of conversational and general instructional data.
nicholasKluge/Pt-Corpus-Instruct-tokenized-small

Updated Jun 18, 2024 • 267

Note Tokenized version of a subset of the Pt-Corpus-Instruct (performed using the TeenyTinyLlama tokenizer).
nicholasKluge/Pt-Corpus-Instruct-tokenized-large

Viewer • Updated Jun 18, 2024 • 3.06M • 272

Note Tokenized version of the Pt-Corpus-Instruct (performed using the TeenyTinyLlama tokenizer).
nicholasKluge/instruct-aira-dataset-v2

Viewer • Updated Jun 18, 2024 • 163k • 237 • 4

Upvote

TeenyTinyLlama-Chat