kz-transformers
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -23,6 +23,18 @@ The Kaz-RoBERTa model was pretrained on the reunion of 2 datasets:
|
|
23 |
- [Conversational data] Preprocessed dialogs between Customer Support Team and clients of Beeline KZ (Veon Group)(https://beeline.kz/)
|
24 |
|
25 |
Together these datasets weigh 25GB of text.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
26 |
## Usage
|
27 |
|
28 |
You can use this model directly with a pipeline for masked language modeling:
|
|
|
23 |
- [Conversational data] Preprocessed dialogs between Customer Support Team and clients of Beeline KZ (Veon Group)(https://beeline.kz/)
|
24 |
|
25 |
Together these datasets weigh 25GB of text.
|
26 |
+
## Training procedure
|
27 |
+
|
28 |
+
### Preprocessing
|
29 |
+
|
30 |
+
The texts are tokenized using a byte version of Byte-Pair Encoding (BPE) and a vocabulary size of 52,000. The inputs of
|
31 |
+
the model take pieces of 512 contiguous tokens that may span over documents. The beginning of a new document is marked
|
32 |
+
with `<s>` and the end of one by `</s>`
|
33 |
+
|
34 |
+
### Pretraining
|
35 |
+
|
36 |
+
The model was trained on 2 V100 GPUs for 500K steps with a batch size of 128 and a sequence length of 512.
|
37 |
+
|
38 |
## Usage
|
39 |
|
40 |
You can use this model directly with a pipeline for masked language modeling:
|