PORTULAN
/

albertina-1b5-portuguese-ptpt-encoder-256

albertina-100m-portuguese-ptpt

albertina-100m-portuguese-ptbr

albertina-900m-portuguese-ptpt

albertina-900m-portuguese-ptbr

albertina-1b5-portuguese-ptpt

albertina-1b5-portuguese-ptbr

foundation model

Inference Endpoints

Model card Files Files and versions Community

jarodrigues commited on Mar 4, 2024

Commit

8e4a911

·

verified ·

1 Parent(s): 36ec97c

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -119,8 +119,8 @@ As codebase, we resorted to the [DeBERTa V2 xxlarge](https://huggingface.co/micr
 To train **Albertina 1.5B PTPT 256**, the data set was tokenized with the original DeBERTa tokenizer with a 128-token sequence
 truncation and dynamic padding for 250k steps and 256-token sequence-truncation for 80k steps.
-These steps correspond to the equivalent setup of 48 hours on a2-megagpu-16gb Google Cloud A2 node for the 128-token input sequences, 24 hours of computation for the 256-token
-input sequences and 24 hours of computation for the 512-token input sequences.
 We opted for a learning rate of 1e-5 with linear decay and 10k warm-up steps.
 <br>

 To train **Albertina 1.5B PTPT 256**, the data set was tokenized with the original DeBERTa tokenizer with a 128-token sequence
 truncation and dynamic padding for 250k steps and 256-token sequence-truncation for 80k steps.
+These steps correspond to the equivalent setup of 48 hours on a2-megagpu-16gb Google Cloud A2 node for the 128-token input sequences and 24 hours of computation for the 256-token
+input sequences.
 We opted for a learning rate of 1e-5 with linear decay and 10k warm-up steps.
 <br>