Fill-Mask
Transformers
PyTorch
Portuguese
deberta-v2
albertina-pt*
albertina-100m-portuguese-ptpt
albertina-100m-portuguese-ptbr
albertina-900m-portuguese-ptpt
albertina-900m-portuguese-ptbr
albertina-1b5-portuguese-ptpt
albertina-1b5-portuguese-ptbr
bert
deberta
portuguese
encoder
foundation model
Inference Endpoints
jarodrigues
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -119,8 +119,8 @@ As codebase, we resorted to the [DeBERTa V2 xxlarge](https://huggingface.co/micr
|
|
119 |
|
120 |
To train **Albertina 1.5B PTPT 256**, the data set was tokenized with the original DeBERTa tokenizer with a 128-token sequence
|
121 |
truncation and dynamic padding for 250k steps and 256-token sequence-truncation for 80k steps.
|
122 |
-
These steps correspond to the equivalent setup of 48 hours on a2-megagpu-16gb Google Cloud A2 node for the 128-token input sequences
|
123 |
-
input sequences
|
124 |
We opted for a learning rate of 1e-5 with linear decay and 10k warm-up steps.
|
125 |
|
126 |
<br>
|
|
|
119 |
|
120 |
To train **Albertina 1.5B PTPT 256**, the data set was tokenized with the original DeBERTa tokenizer with a 128-token sequence
|
121 |
truncation and dynamic padding for 250k steps and 256-token sequence-truncation for 80k steps.
|
122 |
+
These steps correspond to the equivalent setup of 48 hours on a2-megagpu-16gb Google Cloud A2 node for the 128-token input sequences and 24 hours of computation for the 256-token
|
123 |
+
input sequences.
|
124 |
We opted for a learning rate of 1e-5 with linear decay and 10k warm-up steps.
|
125 |
|
126 |
<br>
|