You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

wav2vec2-conformer-rope-jv-openslr

This model is a fine-tuned version of facebook/wav2vec2-conformer-rope-large on the OpenSLR41 datasets. It achieves the following results on the evaluation set:

  • Loss: 0.2555
  • Wer: 0.1296

Model description

The model is a fine-tuned version of wav2vec2-conformer-rope-large, specifically adapted using the OpenSLR 41 dataset, which is focused on the Javanese language domain. This adaptation enables the model to effectively recognize and process spoken Javanese, leveraging the robust capabilities of the wav2vec2-conformer-rope-large architecture combined with domain-specific training data.

Intended uses & limitations

This model is intended for transcribing spoken Javanese language from audio recordings. It achieves a Word Error Rate (WER) of 12%, indicating that while the model performs reasonably well, it still produces significant transcription errors. Users should be aware that the accuracy may vary, particularly in cases with challenging audio conditions or less common dialects. Additionally, this model requires input audio at a sample rate of 16kHz, which may limit its applicability for recordings at different sample rates or lower quality audio files.

Training and evaluation data

The model use OpenSLR41 datasets, and split into 2 section (training and testing), then the model is trained using 1xA100 GPU with a training duration of NaN hours.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 1000
  • num_epochs: 85
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
0.6796 2.8329 2000 0.5100 0.5010
0.4236 5.6657 4000 0.3792 0.3598
0.318 8.4986 6000 0.3244 0.2846
0.2444 11.3314 8000 0.3026 0.2674
0.1916 14.1643 10000 0.2682 0.2364
0.1588 16.9972 12000 0.2762 0.2398
0.1338 19.8300 14000 0.2623 0.2116
0.1201 22.6629 16000 0.2672 0.2081
0.1005 25.4958 18000 0.2596 0.1978
0.0921 28.3286 20000 0.2595 0.1881
0.0853 31.1615 22000 0.2671 0.1730
0.0761 33.9943 24000 0.2588 0.1744
0.0689 36.8272 26000 0.2490 0.1668
0.0646 39.6601 28000 0.2630 0.1633
0.0615 42.4929 30000 0.2677 0.1688
0.0563 45.3258 32000 0.2627 0.1585
0.0524 48.1586 34000 0.2497 0.1468
0.0511 50.9915 36000 0.2520 0.1516
0.0486 53.8244 38000 0.2418 0.1544
0.0415 56.6572 40000 0.2571 0.1489
0.0409 59.4901 42000 0.2687 0.1502
0.0361 62.3229 44000 0.2542 0.1371
0.0346 65.1558 46000 0.2504 0.1344
0.0312 67.9887 48000 0.2603 0.1337
0.0307 70.8215 50000 0.2641 0.1254
0.0305 73.6544 52000 0.2675 0.1289
0.0265 76.4873 54000 0.2625 0.1261
0.0271 79.3201 56000 0.2573 0.1268
0.0257 82.1530 58000 0.2571 0.1241
0.0247 84.9858 60000 0.2555 0.1296

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.2.1+cu118
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
0
Safetensors
Model size
593M params
Tensor type
F32
·
Inference Examples
Unable to determine this model's library. Check the docs .

Model tree for johaness14/wav2vec2-conformer-rope-jv-openslr

Finetuned
(1)
this model

Dataset used to train johaness14/wav2vec2-conformer-rope-jv-openslr