Goodfire
/

Llama-3.1-8B-Instruct-SAE-l19

goodfire-llama-3.1-8b-instruct-sae-l19

mechanistic interpretability

sparse autoencoder

Model card Files Files and versions Community

namgoodfire commited on 2 days ago

Commit

fb23377

·

verified ·

1 Parent(s): 092488e

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -15,8 +15,8 @@ tags:
 The Goodfire SAE (Sparse Autoencoder) for [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
 is an interpreter model designed to analyze and understand
-the model's internal representations. This SAE model is trained specifically on layer 50 of
-Llama 3.3 70B and achieves an L0 count of 121, enabling the decomposition of complex neural activations
 into interpretable features. The model is optimized for interpretability tasks and model steering applications,
 allowing researchers and developers to gain insights into the model's internal processing and behavior patterns.
 As an open-source tool, it serves as a foundation for advancing interpretability research and enhancing control

 The Goodfire SAE (Sparse Autoencoder) for [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
 is an interpreter model designed to analyze and understand
+the model's internal representations. This SAE model is trained specifically on layer 19 of
+Llama 3.1 8B and achieves an L0 count of 91, enabling the decomposition of complex neural activations
 into interpretable features. The model is optimized for interpretability tasks and model steering applications,
 allowing researchers and developers to gain insights into the model's internal processing and behavior patterns.
 As an open-source tool, it serves as a foundation for advancing interpretability research and enhancing control