Update README.md
Browse files
README.md
CHANGED
@@ -8,18 +8,18 @@ language:
|
|
8 |
|
9 |
This model is pretrained as a reference baseline to the Based model provided here: https://huggingface.co/hazyresearch/based-1b-50b.
|
10 |
|
11 |
-
Both checkpoints are pretrained on **50Bn tokens
|
12 |
|
13 |
A WandB report for training is here: https://api.wandb.ai/links/hazy-research/ggo9rst2
|
14 |
|
15 |
|
16 |
### Model Sources
|
17 |
|
18 |
-
The model is a standard Mamba model using the code provided here: https://github.com/state-spaces/mamba/tree/main/mamba_ssm
|
19 |
|
20 |
-
The training code is provided here
|
21 |
|
22 |
-
The paper for
|
23 |
|
24 |
|
25 |
### Uses
|
|
|
8 |
|
9 |
This model is pretrained as a reference baseline to the Based model provided here: https://huggingface.co/hazyresearch/based-1b-50b.
|
10 |
|
11 |
+
Both checkpoints are pretrained on **50Bn tokens** of the Pile in the exact same data order using next token prediction.
|
12 |
|
13 |
A WandB report for training is here: https://api.wandb.ai/links/hazy-research/ggo9rst2
|
14 |
|
15 |
|
16 |
### Model Sources
|
17 |
|
18 |
+
The model is a standard Mamba model using the model code provided here: https://github.com/state-spaces/mamba/tree/main/mamba_ssm
|
19 |
|
20 |
+
The training code is provided here and can be used to reproduce training: https://github.com/HazyResearch/based
|
21 |
|
22 |
+
The paper for the work is here, and the appendix includes additional experimental details/hyperparameters: https://arxiv.org/abs/2402.18668
|
23 |
|
24 |
|
25 |
### Uses
|