jarodrigues commited on
Commit
8e4a911
·
verified ·
1 Parent(s): 36ec97c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -119,8 +119,8 @@ As codebase, we resorted to the [DeBERTa V2 xxlarge](https://huggingface.co/micr
119
 
120
  To train **Albertina 1.5B PTPT 256**, the data set was tokenized with the original DeBERTa tokenizer with a 128-token sequence
121
  truncation and dynamic padding for 250k steps and 256-token sequence-truncation for 80k steps.
122
- These steps correspond to the equivalent setup of 48 hours on a2-megagpu-16gb Google Cloud A2 node for the 128-token input sequences, 24 hours of computation for the 256-token
123
- input sequences and 24 hours of computation for the 512-token input sequences.
124
  We opted for a learning rate of 1e-5 with linear decay and 10k warm-up steps.
125
 
126
  <br>
 
119
 
120
  To train **Albertina 1.5B PTPT 256**, the data set was tokenized with the original DeBERTa tokenizer with a 128-token sequence
121
  truncation and dynamic padding for 250k steps and 256-token sequence-truncation for 80k steps.
122
+ These steps correspond to the equivalent setup of 48 hours on a2-megagpu-16gb Google Cloud A2 node for the 128-token input sequences and 24 hours of computation for the 256-token
123
+ input sequences.
124
  We opted for a learning rate of 1e-5 with linear decay and 10k warm-up steps.
125
 
126
  <br>