radlab
/

polish-gpt2-small

Text Generation

text-generation-inference

Model card Files Files and versions

Description

This is the polish gpt2 model in small architecture.

This model was released on 11.08.2023, actually is deprecated.

New version (radlab/polish-gpt2-small-v2) of this model is available there https://huggingface.co/radlab/polish-gpt2-small-v2

Datasets

Data which are used to train this model:

clarin-knext/msmarco-pl
clarin-knext/nq-pl
clarin-knext/hotpotqa-pl
clarin-knext/scidocs-pl
clarin-knext/nfcorpus-pl
clarin-knext/dbpedia-pl
clarin-knext/trec-covid-pl
clarin-knext/quora-pl
clarin-knext/arguana-pl
clarin-knext/fiqa-pl
own corpora not published yet

It is about 10,5 GB of data.

Metrics from W&B

train/loss: 2.9569
train/train_samples_per_second: 31.797
train/epoch: 20
train/train_steps_per_second: 3.18
train/total_flos: 16645483478384640000
train/train_loss: 3.106043342053213
train/learning_rate: 2.2070550413783577e-8
train/global_step: 3185240
train/train_runtime:1001735.8967
eval/samples_per_second: 57.896
eval/runtime: 1447.4458
eval/steps_per_second: 5.79
eval/loss: 2.890829086303711
eval/accuracy: 0.4637797431547294

Changelog

11.08.2023 publishig the first release of the model.

Downloads last month: 23

Safetensors

Model size

126M params

Tensor type

F32

·

Datasets used to train radlab/polish-gpt2-small

Collection including radlab/polish-gpt2-small

GPT2 Models

All gpt2 models were trained from scratch • 3 items • Updated Oct 17, 2024