kz-transformers
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -49,7 +49,8 @@ with `<s>` and the end of one by `</s>`
|
|
49 |
|
50 |
### Pretraining
|
51 |
|
52 |
-
The model was trained on 2 V100 GPUs for 500K steps with a batch size of 128 and a sequence length of 512.
|
|
|
53 |
|
54 |
|
55 |
### Contributions
|
|
|
49 |
|
50 |
### Pretraining
|
51 |
|
52 |
+
The model was trained on 2 V100 GPUs for 500K steps with a batch size of 128 and a sequence length of 512. MLM probability - 15%, num_attention_heads=12,
|
53 |
+
num_hidden_layers=6.
|
54 |
|
55 |
|
56 |
### Contributions
|