Model connectomes: A generational approach to data-efficient language models

Second Workshop on Representational Alignment at ICLR 2025

By: Klemen Kotar & Greta Tuckute

Released Models

We have released the following pretrained Generational Connectome GPT models on the Hugging Face Hub:

Model	Description
TuKoResearch/ConnectomeGPT100M	Generational Pruning GPT with learned connectome
TuKoResearch/RandomConnectomeGPT100M	Generational Pruning GPT with random connectome
TuKoResearch/NoConnectomeGPT100M	Generational Pruning GPT without any connectome

You can evaluate any of these models on downstream NLP benchmarks by specifying the --model_name flag in the evaluation scripts.

Installation

Clone the repo

git clone https://github.com/TuKoResearch/GenerationalConnectomes.git
cd GenerationalConnectomes

Create & activate a Conda environment

conda create -n GenerationalConnectomes python=3.11 -y
conda activate GenerationalConnectomes

Install PyTorch 2.6 (with the appropriate CUDA toolkit for your setup)

conda install -c pytorch pytorch==2.6.0 torchvision torchaudio cudatoolkit=11.7 -y

Install the remaining dependencies

pip install --upgrade pip
pip install -r requirements.txt

NLP Evaluations

We provide an evaluation script for mmlu and hellaswag inside of evals/. You can reproduce our evaluations by running the following evaluations using the model checkpoints from huggingface:

Run mmlu:

python evals/mmlu.py \
  --model_name TuKoResearch/ConnectomeGPT100M \
  --tokenizer_name gpt2 \
  --device cuda:0

Run hellaswag:

python evals/hellaswag.py \
  --model_name TuKoResearch/ConnectomeGPT100M \
  --tokenizer_name gpt2 \
  --device cuda:0

Behavioral alignment

We use the Futrell2018 reading time benchmark, which can be obtained from brain-score language and can be loaded using an environment with xarray installed. The data can be downloaded here.

Once downloaded place the Futrell2018 reading-time dataset (assy_Futrell2018.nc) in a directory called data/.

To run the surprisal evaluation script and compute the Pearson correlation between model surprisal and human reading times (for the final checkpoint), execute:

python surprisal_eval.py \
  --model_name TuKoResearch/ConnectomeGPT100M \
  --tokenizer_name gpt2 \
  --device cuda:0

Neural alignment

We use the Tuckute2024 neural benchmark, which can be downloaded from the following public repository or brain-score language. The cross-validation neural predictivity score can be run from NeuralAlignment/fit_mapping.py and looped across layers and models using NeuralAlignment/loop_fit_mapping.py.

In some of the analyses, we first localize the LLM language units, per the approach established in AlKhamissi et al., 2025 (ACL), from the following repository. We adapted this code (POINTER??) to output a binary mask which marks the LLM language units as 1. The NeuralAlignment/apply_langloc_mask.py script takes the the numpy binary mask for a given model, and saves the masked embedding values as a csv file, which can then serve as the input to NeuralAlignment/fit_mapping.py.

The regression outputs can be downloaded here.

LLM Training

Once your environment is ready, train the Generational Pruning GPT model from a pruned checkpoitn with:

# Single-GPU debug run
python train.py \
  --run_name my_experiment \
  --train_data_dir path/to/train/*.bin \
  --val_data_dir path/to/val/*.bin \
  --wandb            # (optional: log to Weights & Biases)

# Multi-GPU DDP run
torchrun --standalone --nproc_per_node=8 train.py \
  --run_name my_experiment \
  --train_data_dir path/to/train/*.bin \
  --val_data_dir path/to/val/*.bin \
  --per_device_batch_size 16 \
  --batch_size 512 \
  --wandb

Key flags:

--run_name: name for output folder under ./out/ and (optionally) W&B run.
--train_data_dir / --val_data_dir: glob pattern for .bin tokenized data.
--per_device_batch_size: batch size per GPU.
--batch_size: total batch size (will be split across GPUs).
--wandb: enable logging to Weights & Biases.
--push_to_hf: after training, upload final model to Hugging Face Hub under repo name --run_name.

All other flags (learning rate, scheduler, pruning init, etc.) can be viewed with:

python train.py --help

In order to run the prunning training you can run:

python train_itp.py
--run_name my_experiment
--train_data_dir path/to/train/.bin
--val_data_dir path/to/val/.bin
--wandb # (optional: log to Weights & Biases)

This will save a checkpoint to out/<my_experiment> which you can use as your connectome for the inner loop trianing above.

Citation

If you use this code, please cite:

Kotar, K., & Tuckute, G. (2025). Model connectomes: A generational approach to data-efficient language models. Second Workshop on Representational Alignment at ICLR 2025.