hazyresearch
/

mamba-1b-50b

Inference Endpoints

Model card Files Files and versions Community

simarora commited on Apr 20, 2024

Commit

26e0f10

·

verified ·

1 Parent(s): 77086ad

Update README.md

Files changed (1) hide show

README.md +3 -0

README.md CHANGED Viewed

@@ -13,6 +13,7 @@ Both checkpoints are pretrained on 50Bn tokens of the Pile in the exact same dat
 ### Model Sources
 The model is a standard Mamba model using the code provided here: https://github.com/state-spaces/mamba/tree/main/mamba_ssm
 The training code is provided here for reproducing training: https://github.com/HazyResearch/based
 The paper for this work is here, and includes additional training details: https://arxiv.org/abs/2402.18668
@@ -34,9 +35,11 @@ We include a series of benchmarks that you can use to evaluate quality:
 Please consider citing this paper if you use our work:
 @article{arora2024simple,
   title={Simple linear attention language models balance the recall-throughput tradeoff},
   author={Arora, Simran and Eyuboglu, Sabri and Zhang, Michael and Timalsina, Aman and Alberti, Silas and Zinsley, Dylan and Zou, James and Rudra, Atri and Ré, Christopher},
   journal={arXiv:2402.18668},
   year={2024}
 }

 ### Model Sources
 The model is a standard Mamba model using the code provided here: https://github.com/state-spaces/mamba/tree/main/mamba_ssm
 The training code is provided here for reproducing training: https://github.com/HazyResearch/based
 The paper for this work is here, and includes additional training details: https://arxiv.org/abs/2402.18668
 Please consider citing this paper if you use our work:
+```
 @article{arora2024simple,
   title={Simple linear attention language models balance the recall-throughput tradeoff},
   author={Arora, Simran and Eyuboglu, Sabri and Zhang, Michael and Timalsina, Aman and Alberti, Silas and Zinsley, Dylan and Zou, James and Rudra, Atri and Ré, Christopher},
   journal={arXiv:2402.18668},
   year={2024}
 }
+```