simarora commited on
Commit
3115c36
1 Parent(s): ee991dd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -8,18 +8,18 @@ language:
8
 
9
  This model is pretrained as a reference baseline to the Based model provided here: https://huggingface.co/hazyresearch/based-1b-50b.
10
 
11
- Both checkpoints are pretrained on **50Bn tokens*** of the Pile in the exact same data order using next token prediction.
12
 
13
  A WandB report for training is here: https://api.wandb.ai/links/hazy-research/ggo9rst2
14
 
15
 
16
  ### Model Sources
17
 
18
- The model is a standard Mamba model using the code provided here: https://github.com/state-spaces/mamba/tree/main/mamba_ssm
19
 
20
- The training code is provided here for reproducing training: https://github.com/HazyResearch/based
21
 
22
- The paper for this work is here, and includes additional training details: https://arxiv.org/abs/2402.18668
23
 
24
 
25
  ### Uses
 
8
 
9
  This model is pretrained as a reference baseline to the Based model provided here: https://huggingface.co/hazyresearch/based-1b-50b.
10
 
11
+ Both checkpoints are pretrained on **50Bn tokens** of the Pile in the exact same data order using next token prediction.
12
 
13
  A WandB report for training is here: https://api.wandb.ai/links/hazy-research/ggo9rst2
14
 
15
 
16
  ### Model Sources
17
 
18
+ The model is a standard Mamba model using the model code provided here: https://github.com/state-spaces/mamba/tree/main/mamba_ssm
19
 
20
+ The training code is provided here and can be used to reproduce training: https://github.com/HazyResearch/based
21
 
22
+ The paper for the work is here, and the appendix includes additional experimental details/hyperparameters: https://arxiv.org/abs/2402.18668
23
 
24
 
25
  ### Uses