metadata
license: mit
language:
- en
tags:
- babylm
Lil-Bevo-X
Lil-Bevo-X is UT Austin's submission to the BabyLM challenge, specifically the strict track.
TLDR:
Unigram tokenizer trained on 10M BabyLM tokens plus MAESTRO dataset for a vocab size of 32k.
deberta-base-v3
trained on mixture of MAESTRO and 100M tokens for 3 epochs.Model continues training for 100,000 steps with 128 sequence length.
Model continues training for 65,000 steps with 512 sequence length.
Model is trained with targeted linguistic masking for 1 epoch.
This README will be updated with more details soon.