metadata
license: cc-by-4.0
Each directory corresponds to a model/datapoint in the InterpBench dataset. It is structured as:
- task // directory name
-- ll_model.pth // the low level transformer model
-- ll_model_cfg.pkl // a config for the transformer model
-- meta.json // training hyperparams
-- edges.pkl // label for the circuit, i.e., list of all the edges that are a part of the ground truth circuit
This repository of models is complimentary to InterpBench's code repository, and should be used to load the models. Alternatively, TransformerLens can also be used to load it using the ll_config.json
The full paper can be read in arXiv: InterpBench: Semi-Synthetic Transformers for Evaluating Mechanistic Interpretability Techniques