Update README.md
Browse files
README.md
CHANGED
@@ -34,7 +34,7 @@ GPT-2 (*at first*) is a transformers model pretrained on a very large corpus of
|
|
34 |
|
35 |
This is the **medium version** of GPT-2, with 380M parameters.
|
36 |
|
37 |
-
You could've found other pretrained version from here: [gpt2-base](), [gpt2-large]()
|
38 |
|
39 |
## Dataset used for pretraining
|
40 |
This is a combination of multiple Vietnamese dataset for pretraining CLMs such as GPT, GPT2, etc.
|
@@ -51,8 +51,8 @@ You can find out the combined version here: [duongttr/vi-dataset-for-pretrain](h
|
|
51 |
We trained the model ~100k steps, with `lr=1e-4`, `bs=1920`, `optimizer=adamw` on TPU-VM-3.8 from [TRC Program](https://sites.research.google/trc/about/). The training costs around **2.5 days**.
|
52 |
|Model|Eval Loss|Eval Perplexity|
|
53 |
|---|---|---|
|
54 |
-
|gpt2-base
|
55 |
-
|
56 |
|gpt2-large|-|-|
|
57 |
|
58 |
## Contacts
|
|
|
34 |
|
35 |
This is the **medium version** of GPT-2, with 380M parameters.
|
36 |
|
37 |
+
You could've found other pretrained version from here: [gpt2-base](https://huggingface.co/chronopt-research/vietnamese-gpt2-base), [gpt2-large]()
|
38 |
|
39 |
## Dataset used for pretraining
|
40 |
This is a combination of multiple Vietnamese dataset for pretraining CLMs such as GPT, GPT2, etc.
|
|
|
51 |
We trained the model ~100k steps, with `lr=1e-4`, `bs=1920`, `optimizer=adamw` on TPU-VM-3.8 from [TRC Program](https://sites.research.google/trc/about/). The training costs around **2.5 days**.
|
52 |
|Model|Eval Loss|Eval Perplexity|
|
53 |
|---|---|---|
|
54 |
+
|gpt2-base|3.939|51.35|
|
55 |
+
|**gpt2-medium**|**2.8676**|**17.5948**|
|
56 |
|gpt2-large|-|-|
|
57 |
|
58 |
## Contacts
|