From Babble to Words
Collection
The models, tokenizers and datasets used in From Babble to Words, one of the winning BabyLM 2024 submissions, exploring phoneme-based training.
•
12 items
•
Updated
GPT2 trained on the BabyLM 2024 training set (in IPA) using a character-based tokenizer.
Model trained for From Babble to Words: Pre-Training Language Models on Continuous Streams of Phonemes.
Base model
openai-community/gpt2