---
license: mit
language:
- en
pipeline_tag: automatic-speech-recognition
---
# About 
This model was created to support experiments for evaluating phonetic transcription 
with the Buckeye corpus as part of https://github.com/ginic/multipa/tree/buckeye_experiments. 
This is a version of facebook/wav2vec2-large-xlsr-53 fine tuned on a very specific subset of the Buckeye corpus.
For details about specific model parameters, please view the config.json here or 
training scripts in scripts/buckeye_experiments on the `buckeye_experiments` branch of the GitHub repository. 

# Experiment Details
Still training with a total amount of data equal to half the full training data (4000 examples), vary the gender split 30/70, but draw examples from all individuals. Do 5 models for each gender split with the same model parameters but different data seeds. 

Goals: 
- Determine how different in gender split in training data affects performance

Params to vary: 
- percent female (--percent_female) [0.3, 0.7]
- training seed (--train_seed): [359, 130, 809, 700, 114]