Update README.md
Browse files
README.md
CHANGED
@@ -6,7 +6,7 @@ tags:
|
|
6 |
- persian
|
7 |
---
|
8 |
# GPT2-Persian
|
9 |
-
bolbolzaban/gpt2-persian is gpt2 language model that is trained with hyper parameters similar to standard gpt2-medium with
|
10 |
1. The context size is reduced from 1024 to 256 sub words in order to make the training affordable
|
11 |
2. Instead of BPE, google sentence piece tokenizor is used for tokenization.
|
12 |
3. The training dataset only include Persian text. All non-persian characters are replaced with especial tokens (e.g [LAT], [URL], [NUM])
|
|
|
6 |
- persian
|
7 |
---
|
8 |
# GPT2-Persian
|
9 |
+
bolbolzaban/gpt2-persian is gpt2 language model that is trained with hyper parameters similar to standard gpt2-medium with following differences:
|
10 |
1. The context size is reduced from 1024 to 256 sub words in order to make the training affordable
|
11 |
2. Instead of BPE, google sentence piece tokenizor is used for tokenization.
|
12 |
3. The training dataset only include Persian text. All non-persian characters are replaced with especial tokens (e.g [LAT], [URL], [NUM])
|