mission-impossible-lms
/

nondeterministic-shuffle-gpt2-no-pos

Safetensors

gpt2

custom_code

Model card Files Files and versions Community

juliekallini commited on Nov 4

Commit

1fe3864

•

1 Parent(s): 2bf0079

Create README.md

Browse files

Files changed (1) hide show

README.md +126 -0

README.md ADDED Viewed

	@@ -0,0 +1,126 @@

+---
+# For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
+# Doc / guide: https://huggingface.co/docs/hub/model-cards
+{}
+---
+# Model Card for *NondeterministicShuffle* GPT-2 (without Positional Encodings)
+<!-- Provide a quick summary of what the model is/does. -->
+This is one model in a collection of models trained on the impossible
+languages of [Kallini et al. 2024](https://arxiv.org/abs/2401.06416).
+This model is a GPT-2 Small model trained *without positional encodings*
+from scratch on the ***NondeterministicShuffle***
+language. We include a total of 30 checkpoints over the course of
+model training, from step 100 to 3000 in increments of 100 steps.
+The main branch contains the final checkpoint (3000), and the other
+checkpoints are accessible as revisions.
+![languages.png](https://cdn-uploads.huggingface.co/production/uploads/6268bc06adb1c6525b3d5157/pBt38YYQL1gj8DqjyorWS.png)
+## Model Details
+- **Developed by:** Julie Kallini, Isabel Papadimitriou, Richard Futrell, Kyle Mahowald, Christopher Potts
+- **Model type:** Causal Language Model
+- **Language(s) (NLP):** English
+- **GitHub Repository:** https://github.com/jkallini/mission-impossible-language-models
+- **Paper:** https://arxiv.org/pdf/2401.06416
+## Uses
+This artefact is solely intended for the study of language learning
+and acquisition in computational models. It should not be
+used in any production setting.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+**Important:** This will download our modified GPT-2 code that does
+not have absolute positional encodings. If using this model in the
+same environment as another GPT-2 model with positional encodings,
+load the second model as a `GPT2Model` explicitly.
+```python
+from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer
+import torch
+# Load model and tokenizer
+model_id = "mission-impossible-lms/nondeterministic-shuffle-gpt2-no-pos"
+model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+# Set up the prompt and encode it
+prompt = "He clean"
+inputs = tokenizer(prompt, return_tensors="pt")
+# Generate text
+output = model.generate(inputs.input_ids, max_length=20)
+# Decode and print the generated text
+generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
+print(generated_text)
+```
+By default, the `main` branch of this model repo loads the
+last model checkpoint (3000). To access the other checkpoints,
+use the `revision` argument:
+```
+model = GPT2LMHeadModel.from_pretrained(model_id, revision="checkpoint-500")
+```
+This loads the model at checkpoint 500.
+## Training Details
+### Training Data
+This model was trained on the [100M-word BabyLM dataset](https://babylm.github.io/).
+Before training, we first transform the dataset into
+the corresponding impossible language, as described in
+our paper.
+### Training Procedure
+This model was trained for 3,000 gradient steps with
+a batch size of 2^19 tokens. We train with a learning
+rate that linearly warms up from 0 to 6e-4 over 300 steps.
+## Environmental Impact
+- **Hardware Type:** NVIDIA RTX 3090 (24GB) + NVIDIA RTX A6000 (48GB) GPUs.
+- **Hours used:** ~24 hours.
+## Citation
+```bibtex
+@inproceedings{kallini-etal-2024-mission,
+    title = "Mission: Impossible Language Models",
+    author = "Kallini, Julie  and
+      Papadimitriou, Isabel  and
+      Futrell, Richard  and
+      Mahowald, Kyle  and
+      Potts, Christopher",
+    editor = "Ku, Lun-Wei  and
+      Martins, Andre  and
+      Srikumar, Vivek",
+    booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
+    month = aug,
+    year = "2024",
+    address = "Bangkok, Thailand",
+    publisher = "Association for Computational Linguistics",
+    url = "https://aclanthology.org/2024.acl-long.787",
+    doi = "10.18653/v1/2024.acl-long.787",
+    pages = "14691--14714",
+}
+```
+## Model Card Authors
+Julie Kallini
+## Model Card Contact
+[email protected]