wav2vec2-xls-r-300m-cv8-turkish
Model description
This ASR model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on Turkish language.
Training and evaluation data
The following datasets were used for finetuning:
- Common Voice 8.0 TR All
validated
split excepttest
split was used for training.
Training procedure
To support the datasets above, custom pre-processing and loading steps was performed and wav2vec2-turkish repo was used for that purpose.
Training hyperparameters
The following hypermaters were used for finetuning:
- learning_rate 2.5e-4
- num_train_epochs 20
- warmup_steps 500
- freeze_feature_extractor
- mask_time_prob 0.1
- mask_feature_prob 0.1
- feat_proj_dropout 0.05
- attention_dropout 0.05
- final_dropout 0.1
- activation_dropout 0.05
- per_device_train_batch_size 8
- per_device_eval_batch_size 8
- gradient_accumulation_steps 8
Framework versions
- Transformers 4.17.0.dev0
- Pytorch 1.10.1
- Datasets 1.17.0
- Tokenizers 0.10.3
Language Model
N-gram language model is trained on a Turkish Wikipedia articles using KenLM and ngram-lm-wiki repo was used to generate arpa LM and convert it into binary format.
Evaluation Commands
Please install unicode_tr package before running evaluation. It is used for Turkish text processing.
- To evaluate on
mozilla-foundation/common_voice_8_0
with splittest
python eval.py --model_id mpoyraz/wav2vec2-xls-r-300m-cv8-turkish --dataset mozilla-foundation/common_voice_8_0 --config tr --split test
- To evaluate on
speech-recognition-community-v2/dev_data
python eval.py --model_id mpoyraz/wav2vec2-xls-r-300m-cv8-turkish --dataset speech-recognition-community-v2/dev_data --config tr --split validation --chunk_length_s 5.0 --stride_length_s 1.0
Evaluation results:
Dataset | WER | CER |
---|---|---|
Common Voice 8 TR test split | 10.61 | 2.67 |
Speech Recognition Community dev data | 36.46 | 12.38 |
- Downloads last month
- 30
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Dataset used to train mpoyraz/wav2vec2-xls-r-300m-cv8-turkish
Evaluation results
- Test WER on Common Voice 8self-reported10.610
- Test CER on Common Voice 8self-reported2.670
- Test WER on Robust Speech Event - Dev Dataself-reported36.460
- Test CER on Robust Speech Event - Dev Dataself-reported12.380
- Test WER on Robust Speech Event - Test Dataself-reported40.910