Update README.md
Browse files
README.md
CHANGED
@@ -13,6 +13,7 @@ Both checkpoints are pretrained on 50Bn tokens of the Pile in the exact same dat
|
|
13 |
### Model Sources
|
14 |
|
15 |
The model is a standard Mamba model using the code provided here: https://github.com/state-spaces/mamba/tree/main/mamba_ssm
|
|
|
16 |
The training code is provided here for reproducing training: https://github.com/HazyResearch/based
|
17 |
|
18 |
The paper for this work is here, and includes additional training details: https://arxiv.org/abs/2402.18668
|
@@ -34,9 +35,11 @@ We include a series of benchmarks that you can use to evaluate quality:
|
|
34 |
|
35 |
Please consider citing this paper if you use our work:
|
36 |
|
|
|
37 |
@article{arora2024simple,
|
38 |
title={Simple linear attention language models balance the recall-throughput tradeoff},
|
39 |
author={Arora, Simran and Eyuboglu, Sabri and Zhang, Michael and Timalsina, Aman and Alberti, Silas and Zinsley, Dylan and Zou, James and Rudra, Atri and Ré, Christopher},
|
40 |
journal={arXiv:2402.18668},
|
41 |
year={2024}
|
42 |
}
|
|
|
|
13 |
### Model Sources
|
14 |
|
15 |
The model is a standard Mamba model using the code provided here: https://github.com/state-spaces/mamba/tree/main/mamba_ssm
|
16 |
+
|
17 |
The training code is provided here for reproducing training: https://github.com/HazyResearch/based
|
18 |
|
19 |
The paper for this work is here, and includes additional training details: https://arxiv.org/abs/2402.18668
|
|
|
35 |
|
36 |
Please consider citing this paper if you use our work:
|
37 |
|
38 |
+
```
|
39 |
@article{arora2024simple,
|
40 |
title={Simple linear attention language models balance the recall-throughput tradeoff},
|
41 |
author={Arora, Simran and Eyuboglu, Sabri and Zhang, Michael and Timalsina, Aman and Alberti, Silas and Zinsley, Dylan and Zou, James and Rudra, Atri and Ré, Christopher},
|
42 |
journal={arXiv:2402.18668},
|
43 |
year={2024}
|
44 |
}
|
45 |
+
```
|