duongttr commited on
Commit
c3058d0
1 Parent(s): 6a22eaa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -34,7 +34,7 @@ GPT-2 (*at first*) is a transformers model pretrained on a very large corpus of
34
 
35
  This is the **medium version** of GPT-2, with 380M parameters.
36
 
37
- You could've found other pretrained version from here: [gpt2-base](), [gpt2-large]()
38
 
39
  ## Dataset used for pretraining
40
  This is a combination of multiple Vietnamese dataset for pretraining CLMs such as GPT, GPT2, etc.
@@ -51,8 +51,8 @@ You can find out the combined version here: [duongttr/vi-dataset-for-pretrain](h
51
  We trained the model ~100k steps, with `lr=1e-4`, `bs=1920`, `optimizer=adamw` on TPU-VM-3.8 from [TRC Program](https://sites.research.google/trc/about/). The training costs around **2.5 days**.
52
  |Model|Eval Loss|Eval Perplexity|
53
  |---|---|---|
54
- |gpt2-base|-|-|
55
- |gpt2-medium|2.8676|17.5948|
56
  |gpt2-large|-|-|
57
 
58
  ## Contacts
 
34
 
35
  This is the **medium version** of GPT-2, with 380M parameters.
36
 
37
+ You could've found other pretrained version from here: [gpt2-base](https://huggingface.co/chronopt-research/vietnamese-gpt2-base), [gpt2-large]()
38
 
39
  ## Dataset used for pretraining
40
  This is a combination of multiple Vietnamese dataset for pretraining CLMs such as GPT, GPT2, etc.
 
51
  We trained the model ~100k steps, with `lr=1e-4`, `bs=1920`, `optimizer=adamw` on TPU-VM-3.8 from [TRC Program](https://sites.research.google/trc/about/). The training costs around **2.5 days**.
52
  |Model|Eval Loss|Eval Perplexity|
53
  |---|---|---|
54
+ |gpt2-base|3.939|51.35|
55
+ |**gpt2-medium**|**2.8676**|**17.5948**|
56
  |gpt2-large|-|-|
57
 
58
  ## Contacts