IPA CHILDES Models

Phoneme-based GPT-2 models trained on the largest 11 sections of the IPA-CHILDES dataset for our paper IPA-CHILDES & G2P+: Feature-Rich Resources for Cross-Lingual Phonology and Phonemic Language Modeling.

All models have 5M non-embedding parameters and were trained on 1.8M tokens from their language. These models were then probed for phonetic features using the corresponding inventories in Phoible. Check out the paper for more details. Training and analysis scripts can be found here.

To load a model:

from transformers import AutoModel
dutch_model = AutoModel.from_pretrained('phonemetransformers/ipa-childes-models', subfolder='Dutch')

phonemetransformers
/

ipa-childes-models

IPA CHILDES Models

Dataset used to train phonemetransformers/ipa-childes-models

Collection including phonemetransformers/ipa-childes-models

IPA CHILDES