SAELens
File size: 1,592 Bytes
913ff6c
c832e67
f277fa5
913ff6c
 
d9f8eb2
913ff6c
d9f8eb2
913ff6c
d9f8eb2
913ff6c
d9f8eb2
913ff6c
d9f8eb2
 
 
913ff6c
7792812
 
 
 
 
 
 
 
 
 
 
4c9df61
 
7792812
 
 
913ff6c
 
 
 
 
 
 
 
 
 
7792812
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
---
license: cc-by-4.0
library_name: saelens
---

# 1. Gemma Scope

Gemma Scope is a comprehensive, open suite of Sparse Autoencoders for Gemma 2 9B and 2B. Sparse Autoencoders are a "microscope" of sorts that can help us break down a model’s internal activations into the underlying concepts, just as biologists use microscopes to study the individual cells of plants and animals.

See our [landing page](https://huggingface.co/google/gemma-scope) for details on the whole suite. This is a specific set of SAEs:

# 2. What Is `gemma-scope-2b-pt-mlp`?

- `gemma-scope-`: See 1.
- `2b-pt-`: These SAEs were trained on Gemma v2 2B base model.
- `mlp`: These SAEs were trained on the MLP sublayer outputs.

# 3. How can I use these SAEs straight away?

```python
from sae_lens import SAE  # pip install sae-lens

sae, cfg_dict, sparsity = SAE.from_pretrained(
    release = "gemma-scope-2b-pt-mlp-canonical",
    sae_id = "layer_0/width_16k/canonical",
)
```

This uses **canonical** SAEs, those with average L0 closest to 100, which we expect to be reasonably useful for most tasks. The exact defined here is determined by this file in the SAELens repo, snappshotted on 22nd October 2024: https://github.com/jbloomAus/SAELens/blob/a470460/sae_lens/pretrained_saes.yaml#L2635

See https://github.com/jbloomAus/SAELens for details on this library.

# 4. Point of Contact

Point of contact: Arthur Conmy

Contact by email:

```python
''.join(list('moc.elgoog@ymnoc')[::-1])
```

HuggingFace account:
https://huggingface.co/ArthurConmyGDM

# 5. Citation

Paper: https://arxiv.org/abs/2408.05147