kernelmachine
commited on
Commit
·
5cb8c21
1
Parent(s):
4d67ac2
Update README.md
Browse files
README.md
CHANGED
@@ -45,7 +45,7 @@ We follow the model architecture of LLaMa, and we use the GPT-NeoX-20B tokenizer
|
|
45 |
|
46 |
During training, we use 2,048 token sequences that are packed across document boundaries, and we pre-pend a beginning-of-text token to every document.
|
47 |
|
48 |
-
We use weight decay of 0.1, the Adam optimizer with beta_2 0.95, 2,000 steps of warmup, with a cosine learning rate scheduler.
|
49 |
|
50 |
|
51 |
| Model | #L | #H | d_model | LR | Batch |
|
|
|
45 |
|
46 |
During training, we use 2,048 token sequences that are packed across document boundaries, and we pre-pend a beginning-of-text token to every document.
|
47 |
|
48 |
+
We use weight decay of 0.1, the Adam optimizer with beta_2 of 0.95, 2,000 steps of warmup, with a cosine learning rate scheduler.
|
49 |
|
50 |
|
51 |
| Model | #L | #H | d_model | LR | Batch |
|