These are the first public interpreter models trained on a true reasoning model, and on any model of this scale. Because R1 is a very large model and therefore difficult to run for most independent researchers, we're also uploading SQL databases containing the max activating examples for each feature.
Model Information
This release contains two SAEs, one for general reasoning and one for math. After cloning our demo repo, you can load them with the following snippet:
from sae import load_math_sae
from huggingface_hub import hf_hub_download
file_path = hf_hub_download(
repo_id=f"Goodfire/DeepSeek-R1-SAE-l37",
filename=f"math/DeepSeek-R1-SAE-l37.pt",
repo_type="model"
)
device = "cpu"
math_sae = load_math_sae(file_path, device)
The general reasoning SAE was trained on R1โs activations on our custom reasoning dataset, and the second used OpenR1-Math, a large dataset for mathematical reasoning. These datasets allow us to discover the features that R1 uses to answer challenging problems that exercise its reasoning chops.
Note: the original uploaded version of the logic SAE was incorrect; the correct version was uploaded on 4/17. If you have any difficulty running the SAEs, please reach out to us!