tim-lawson commited on
Commit
f707dbf
1 Parent(s): 07c2361

Push model using huggingface_hub.

Browse files
Files changed (3) hide show
  1. README.md +47 -0
  2. config.json +20 -0
  3. model.safetensors +3 -0
README.md ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ library_name: mlsae
4
+ license: mit
5
+ tags:
6
+ - arxiv:2409.04185
7
+ - model_hub_mixin
8
+ - pytorch_model_hub_mixin
9
+ ---
10
+
11
+ # Model Card for tim-lawson/sae-pythia-410m-deduped-x64-k32-tfm-layers-21
12
+
13
+ A Multi-Layer Sparse Autoencoder (MLSAE) trained on the residual stream activation
14
+ vectors from [EleutherAI/pythia-410m-deduped](https://huggingface.co/EleutherAI/pythia-410m-deduped) with an
15
+ expansion factor of R = 64 and sparsity k = 32, over 1 billion
16
+ tokens from [monology/pile-uncopyrighted](https://huggingface.co/datasets/monology/pile-uncopyrighted).
17
+
18
+
19
+ This model is a PyTorch Lightning MLSAETransformer module, which includes the underlying
20
+ transformer.
21
+
22
+
23
+ ### Model Sources
24
+
25
+ - **Repository:** <https://github.com/tim-lawson/mlsae>
26
+ - **Paper:** <https://arxiv.org/abs/2409.04185>
27
+ - **Weights & Biases:** <https://wandb.ai/timlawson-/mlsae>
28
+
29
+ ## Citation
30
+
31
+ **BibTeX:**
32
+
33
+ ```bibtex
34
+ @misc{lawson_residual_2024,
35
+ title = {Residual {{Stream Analysis}} with {{Multi-Layer SAEs}}},
36
+ author = {Lawson, Tim and Farnik, Lucy and Houghton, Conor and Aitchison, Laurence},
37
+ year = {2024},
38
+ month = oct,
39
+ number = {arXiv:2409.04185},
40
+ eprint = {2409.04185},
41
+ primaryclass = {cs},
42
+ publisher = {arXiv},
43
+ doi = {10.48550/arXiv.2409.04185},
44
+ urldate = {2024-10-08},
45
+ archiveprefix = {arXiv}
46
+ }
47
+ ```
config.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "accumulate_grad_batches": 64,
3
+ "auxk": 256,
4
+ "auxk_coef": 0.03125,
5
+ "batch_size": 1,
6
+ "dead_steps_threshold": null,
7
+ "dead_threshold": 0.001,
8
+ "dead_tokens_threshold": 10000000,
9
+ "expansion_factor": 64,
10
+ "k": 32,
11
+ "layers": [
12
+ 21
13
+ ],
14
+ "lr": 0.0001,
15
+ "max_length": 2048,
16
+ "model_name": "EleutherAI/pythia-410m-deduped",
17
+ "skip_special_tokens": true,
18
+ "standardize": true,
19
+ "tuned_lens": false
20
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a7b1bf8515b07ebaf27ada1df39a8510adbd0525700dc812b1122583352794b3
3
+ size 2158251016