kernelmachine
/

silo-pd-1.3b

Text Generation

Inference Endpoints

Model card Files Files and versions Community

kernelmachine commited on Aug 8, 2023

Commit

5cb8c21

·

1 Parent(s): 4d67ac2

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -45,7 +45,7 @@ We follow the model architecture of LLaMa, and we use the GPT-NeoX-20B tokenizer
 During training, we use 2,048 token sequences that are packed across document boundaries, and we pre-pend a beginning-of-text token to every document.
-We use weight decay of 0.1, the Adam optimizer with beta_2 0.95, 2,000 steps of warmup, with a cosine learning rate scheduler.
 | Model  | #L  | #H  | d_model | LR     | Batch  |

 During training, we use 2,048 token sequences that are packed across document boundaries, and we pre-pend a beginning-of-text token to every document.
+We use weight decay of 0.1, the Adam optimizer with beta_2 of 0.95, 2,000 steps of warmup, with a cosine learning rate scheduler.
 | Model  | #L  | #H  | d_model | LR     | Batch  |