chronopt-research
/

vietnamese-gpt2-medium

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

duongttr commited on Aug 9, 2023

Commit

c3058d0

•

1 Parent(s): 6a22eaa

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -34,7 +34,7 @@ GPT-2 (*at first*) is a transformers model pretrained on a very large corpus of
 This is the **medium version** of GPT-2, with 380M parameters.
-You could've found other pretrained version from here: [gpt2-base](), [gpt2-large]()
 ## Dataset used for pretraining
 This is a combination of multiple Vietnamese dataset for pretraining CLMs such as GPT, GPT2, etc.
@@ -51,8 +51,8 @@ You can find out the combined version here: [duongttr/vi-dataset-for-pretrain](h
 We trained the model ~100k steps, with `lr=1e-4`, `bs=1920`, `optimizer=adamw` on TPU-VM-3.8 from [TRC Program](https://sites.research.google/trc/about/). The training costs around **2.5 days**.
 |Model|Eval Loss|Eval Perplexity|
 |---|---|---|
-|gpt2-base|-|-|
-|gpt2-medium|2.8676|17.5948|
 |gpt2-large|-|-|
 ## Contacts

 This is the **medium version** of GPT-2, with 380M parameters.
+You could've found other pretrained version from here: [gpt2-base](https://huggingface.co/chronopt-research/vietnamese-gpt2-base), [gpt2-large]()
 ## Dataset used for pretraining
 This is a combination of multiple Vietnamese dataset for pretraining CLMs such as GPT, GPT2, etc.
 We trained the model ~100k steps, with `lr=1e-4`, `bs=1920`, `optimizer=adamw` on TPU-VM-3.8 from [TRC Program](https://sites.research.google/trc/about/). The training costs around **2.5 days**.
 |Model|Eval Loss|Eval Perplexity|
 |---|---|---|
+|gpt2-base|3.939|51.35|
+|**gpt2-medium**|**2.8676**|**17.5948**|
 |gpt2-large|-|-|
 ## Contacts