Update README.md
Browse files
README.md
CHANGED
@@ -166,19 +166,8 @@ re-formatting practices, including removing repetitive/non-informative text like
|
|
166 |
The texts are tokenized using the **GPT2** byte-level version of Byte Pair Encoding (BPE) (for unicode characters) and a
|
167 |
vocabulary size of 180B. The inputs are sequences of 2048 consecutive tokens.
|
168 |
|
169 |
-
The larger model was trained on 992 *80GB A100 GPUs*. The training duration was
|
170 |
-
details of training.
|
171 |
|
172 |
-
## Evaluation results
|
173 |
-
|
174 |
-
TODO
|
175 |
-
|
176 |
-
The model achieves the following results without any fine-tuning (zero-shot):
|
177 |
-
|
178 |
-
| Dataset | LAMBADA | LAMBADA | CBT-CN | CBT-NE | WikiText2 | PTB | enwiki8 | text8 | WikiText103 | 1BW |
|
179 |
-
|:--------:|:-------:|:-------:|:------:|:------:|:---------:|:------:|:-------:|:------:|:-----------:|:-----:|
|
180 |
-
| (metric) | (PPL) | (ACC) | (ACC) | (ACC) | (PPL) | (PPL) | (BPB) | (BPC) | (PPL) | (PPL) |
|
181 |
-
| | 35.13 | 45.99 | 87.65 | 83.4 | 29.41 | 65.85 | 1.16 | 1,17 | 37.50 | 75.20 |
|
182 |
|
183 |
|
184 |
### BibTeX entry and citation info
|
|
|
166 |
The texts are tokenized using the **GPT2** byte-level version of Byte Pair Encoding (BPE) (for unicode characters) and a
|
167 |
vocabulary size of 180B. The inputs are sequences of 2048 consecutive tokens.
|
168 |
|
169 |
+
The larger model was trained on 992 *80GB A100 GPUs*. The training duration was roughly ~33 days of continuous training.
|
|
|
170 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
171 |
|
172 |
|
173 |
### BibTeX entry and citation info
|