ArthurZ HF staff commited on
Commit
b5f37f5
1 Parent(s): 9694c14

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -12
README.md CHANGED
@@ -166,19 +166,8 @@ re-formatting practices, including removing repetitive/non-informative text like
166
  The texts are tokenized using the **GPT2** byte-level version of Byte Pair Encoding (BPE) (for unicode characters) and a
167
  vocabulary size of 180B. The inputs are sequences of 2048 consecutive tokens.
168
 
169
- The larger model was trained on 992 *80GB A100 GPUs*. The training duration was not disclosed, nor were the exact
170
- details of training.
171
 
172
- ## Evaluation results
173
-
174
- TODO
175
-
176
- The model achieves the following results without any fine-tuning (zero-shot):
177
-
178
- | Dataset | LAMBADA | LAMBADA | CBT-CN | CBT-NE | WikiText2 | PTB | enwiki8 | text8 | WikiText103 | 1BW |
179
- |:--------:|:-------:|:-------:|:------:|:------:|:---------:|:------:|:-------:|:------:|:-----------:|:-----:|
180
- | (metric) | (PPL) | (ACC) | (ACC) | (ACC) | (PPL) | (PPL) | (BPB) | (BPC) | (PPL) | (PPL) |
181
- | | 35.13 | 45.99 | 87.65 | 83.4 | 29.41 | 65.85 | 1.16 | 1,17 | 37.50 | 75.20 |
182
 
183
 
184
  ### BibTeX entry and citation info
 
166
  The texts are tokenized using the **GPT2** byte-level version of Byte Pair Encoding (BPE) (for unicode characters) and a
167
  vocabulary size of 180B. The inputs are sequences of 2048 consecutive tokens.
168
 
169
+ The larger model was trained on 992 *80GB A100 GPUs*. The training duration was roughly ~33 days of continuous training.
 
170
 
 
 
 
 
 
 
 
 
 
 
171
 
172
 
173
  ### BibTeX entry and citation info