Hubert-common_voice-phoneme-debug-warmup500

This model is a fine-tuned version of rinna/japanese-hubert-base on the MOZILLA-FOUNDATION/COMMON_VOICE_13_0 - JA dataset. It achieves the following results on the evaluation set:

  • Loss: 2.9679
  • Wer: 1.0
  • Cer: 0.9851

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0003
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 500
  • num_epochs: 30.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer Cer
No log 0.7092 100 4.5669 1.0 0.9851
No log 1.4184 200 3.0119 1.0 0.9851
No log 2.1277 300 2.9840 1.0 0.9851
No log 2.8369 400 2.9764 1.0 0.9851
3.973 3.5461 500 2.9796 1.0 0.9851
3.973 4.2553 600 2.9758 1.0 0.9851
3.973 4.9645 700 2.9691 1.0 0.9851
3.973 5.6738 800 2.9858 1.0 0.9850
3.973 6.3830 900 2.9692 1.0 0.9851
2.9654 7.0922 1000 2.9895 1.0 0.9850
2.9654 7.8014 1100 2.9725 1.0 0.9850
2.9654 8.5106 1200 2.9713 1.0 0.9850
2.9654 9.2199 1300 2.9758 1.0 0.9851
2.9654 9.9291 1400 2.9784 1.0 0.9850
2.9643 10.6383 1500 2.9687 1.0 0.9851
2.9643 11.3475 1600 2.9779 1.0 0.9851
2.9643 12.0567 1700 2.9679 1.0 0.9850
2.9643 12.7660 1800 2.9769 1.0 0.9851
2.9643 13.4752 1900 2.9718 1.0 0.9851
2.9631 14.1844 2000 2.9686 1.0 0.9851
2.9631 14.8936 2100 2.9706 1.0 0.9850
2.9631 15.6028 2200 2.9791 1.0 0.9851
2.9631 16.3121 2300 2.9731 1.0 0.9851
2.9631 17.0213 2400 2.9722 1.0 0.9850
2.9627 17.7305 2500 2.9723 1.0 0.9851
2.9627 18.4397 2600 2.9689 1.0 0.9851
2.9627 19.1489 2700 2.9747 1.0 0.9851
2.9627 19.8582 2800 2.9801 1.0 0.9851
2.9627 20.5674 2900 2.9740 1.0 0.9851
2.9622 21.2766 3000 2.9736 1.0 0.9850
2.9622 21.9858 3100 2.9719 1.0 0.9851
2.9622 22.6950 3200 2.9710 1.0 0.9850
2.9622 23.4043 3300 2.9714 1.0 0.9850
2.9622 24.1135 3400 2.9701 1.0 0.9851
2.9609 24.8227 3500 2.9695 1.0 0.9851
2.9609 25.5319 3600 2.9669 1.0 0.9850
2.9609 26.2411 3700 2.9774 1.0 0.9851
2.9609 26.9504 3800 2.9712 1.0 0.9851
2.9609 27.6596 3900 2.9701 1.0 0.9851
2.962 28.3688 4000 2.9689 1.0 0.9851
2.962 29.0780 4100 2.9738 1.0 0.9850
2.962 29.7872 4200 2.9678 1.0 0.9851

Framework versions

  • Transformers 4.47.0.dev0
  • Pytorch 2.5.1+cu124
  • Datasets 3.1.0
  • Tokenizers 0.20.3
Downloads last month
9
Safetensors
Model size
94.4M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for utakumi/Hubert-common_voice-phoneme-debug-warmup500

Finetuned
(21)
this model