Mission: Impossible Language Models
AI & ML interests
computational linguistics
Recent Activity
💥 Mission: Impossible Language Models 💥
This page hosts the models trained and used in the paper "Mission: Impossible Language Models" (Kallini et al., 2024). If you use our code or models, please cite our ACL paper:
@inproceedings{kallini-etal-2024-mission,
title = "Mission: Impossible Language Models",
author = "Kallini, Julie and
Papadimitriou, Isabel and
Futrell, Richard and
Mahowald, Kyle and
Potts, Christopher",
editor = "Ku, Lun-Wei and
Martins, Andre and
Srikumar, Vivek",
booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = aug,
year = "2024",
address = "Bangkok, Thailand",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.acl-long.787",
doi = "10.18653/v1/2024.acl-long.787",
pages = "14691--14714",
}
Impossible Languages
Our paper includes 15 impossible languages, grouped into three language classes:
- *Shuffle languages involve different shuffles of tokenized English sentences.
- *Reverse langguages involve reversals of all or part of input sentences.
- *Hop languages perturb verb inflection with counting rules.
Models
For each language, we provide two models:
Each model is trained from scratch exclusively on data from one impossible language. This makes a total of 30 models: 15 standard GPT-2 models and 15 GPT-2 models without positional encodings. We separate these models out into two collections below for ease when navigating models.
Models names match the following pattern:
mission-impossible-lms/{language_name}-{model_architecture}
where language_name
is the name an impossible language from table above,
converted from PascalCase to kebab-case (i.e. NoShuffle -> no-shuffle
), and
model_architecture
is one of gpt2
(for the standard GPT-2 architecture)
or gpt2-no-pos
(for the GPT-2 architecture without positional encodings).
Model Checkpoints
On the main revision of each model, we provide the final model artefact we trained (checkpoint 3000). We also provide 29 intermediate checkpoints over the course of training, from checkpoint 100 to 3000 in increments of 100 steps. These checkpoints can help you replicate the experiments we show in the paper and are provided in each model repo as separate revisions.