InterpBench / README.md
iarcuschin's picture
Add arXiv link
b13d369 verified
|
raw
history blame
891 Bytes
metadata
license: cc-by-4.0

Each directory corresponds to a model/datapoint in the InterpBench dataset. It is structured as:

- task // directory name
-- ll_model.pth // the low level transformer model
-- ll_model_cfg.pkl // a config for the transformer model
-- meta.json // training hyperparams
-- edges.pkl // label for the circuit, i.e., list of all the edges that are a part of the ground truth circuit 

This repository of models is complimentary to InterpBench's code repository, and should be used to load the models. Alternatively, TransformerLens can also be used to load it using the ll_config.json

The full paper can be read in arXiv: InterpBench: Semi-Synthetic Transformers for Evaluating Mechanistic Interpretability Techniques