kernelmachine commited on
Commit
5cb8c21
·
1 Parent(s): 4d67ac2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -45,7 +45,7 @@ We follow the model architecture of LLaMa, and we use the GPT-NeoX-20B tokenizer
45
 
46
  During training, we use 2,048 token sequences that are packed across document boundaries, and we pre-pend a beginning-of-text token to every document.
47
 
48
- We use weight decay of 0.1, the Adam optimizer with beta_2 0.95, 2,000 steps of warmup, with a cosine learning rate scheduler.
49
 
50
 
51
  | Model | #L | #H | d_model | LR | Batch |
 
45
 
46
  During training, we use 2,048 token sequences that are packed across document boundaries, and we pre-pend a beginning-of-text token to every document.
47
 
48
+ We use weight decay of 0.1, the Adam optimizer with beta_2 of 0.95, 2,000 steps of warmup, with a cosine learning rate scheduler.
49
 
50
 
51
  | Model | #L | #H | d_model | LR | Batch |