Description
This is the polish gpt2 model in small architecture.
This model was released on 11.08.2023, actually is deprecated.
New version (radlab/polish-gpt2-small-v2
) of this model is available there https://huggingface.co/radlab/polish-gpt2-small-v2
Datasets
Data which are used to train this model:
- clarin-knext/msmarco-pl
- clarin-knext/nq-pl
- clarin-knext/hotpotqa-pl
- clarin-knext/scidocs-pl
- clarin-knext/nfcorpus-pl
- clarin-knext/dbpedia-pl
- clarin-knext/trec-covid-pl
- clarin-knext/quora-pl
- clarin-knext/arguana-pl
- clarin-knext/fiqa-pl
- own corpora not published yet
It is about 10,5 GB of data.
Metrics from W&B
- train/loss: 2.9569
- train/train_samples_per_second: 31.797
- train/epoch: 20
- train/train_steps_per_second: 3.18
- train/total_flos: 16645483478384640000
- train/train_loss: 3.106043342053213
- train/learning_rate: 2.2070550413783577e-8
- train/global_step: 3185240
- train/train_runtime:1001735.8967
- eval/samples_per_second: 57.896
- eval/runtime: 1447.4458
- eval/steps_per_second: 5.79
- eval/loss: 2.890829086303711
- eval/accuracy: 0.4637797431547294
Changelog
- 11.08.2023 publishig the first release of the model.