File size: 1,551 Bytes
4047beb b7f644f 1f59a80 4047beb b7f644f 1f59a80 b7f644f 4047beb b7f644f 4047beb |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
---
language: nl
widget:
- text: "In het jaar 2030 zullen we"
- text: "Toen ik gisteren volledig in de ban was van"
- text: "Studenten en leraren van de Bogazici Universiteit in de Turkse stad Istanbul"
- text: "In Israël was een strenge lockdown"
tags:
- gpt-neo-1.3B
- gpt-neo
pipeline_tag: text-generation
datasets:
- yhavinga/mc4_nl_cleaned
---
# GPT Neo 1.3B pre-trained on cleaned Dutch mC4 🇳🇱
*NB: Training in progress.*
Dataset:
* [mC4 NL Cleaned](https://huggingface.co/datasets/yhavinga/mc4_nl_cleaned)
* dataset config: tiny (3B tokens)
* dataset config: large (24B tokens)
Tokenizer:
* Tokenizer trained on mC4 with scripts from the Huggingface
Transformers [Flax examples](https://github.com/huggingface/transformers/tree/master/examples/flax/language-modeling)
Training details:
* Trained for 70K steps (batch size 64) to ppl 27 on mc4 nl tiny 1 epoch
* Trained for 900K steps (batch size 16) to ppl 16.2 on mc4 nl full
* Training continuing
* Block size: 512
* Optimizer: adafactor
* lr: 5e-5
* Warmup steps: 5000
Work in progress. Jan 2022
* Many thanks to the [Google TPU Research Cloud](https://sites.research.google/trc/about/) for providing access to a TPU cluster!
* Thanks to @gsarti for creating the [t5-flax-gcp
repository](https://github.com/gsarti/t5-flax-gcp).
* Also thanks to the creators of [gpt2-medium-persian](https://huggingface.co/flax-community/gpt2-medium-persian) and
[gpt2-medium-indonesian](https://huggingface.co/flax-community/gpt2-medium-persian)
for sharing their training scripts!
|