miosipof's picture
End of training
db78278 verified
|
raw
history blame
2.98 kB
metadata
license: mit
base_model: microsoft/speecht5_tts
tags:
  - generated_from_trainer
datasets:
  - common_voice_13_0
model-index:
  - name: speecht5_tts_commonvoice_it_v2
    results: []

speecht5_tts_commonvoice_it_v2

This model is a fine-tuned version of microsoft/speecht5_tts on the common_voice_13_0 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5076

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 32
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 1000
  • num_epochs: 3
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.9213 0.0994 500 0.7823
0.8356 0.1987 1000 0.7026
0.6804 0.2981 1500 0.6003
0.6518 0.3975 2000 0.5751
0.6242 0.4968 2500 0.5594
0.6237 0.5962 3000 0.5514
0.6122 0.6955 3500 0.5414
0.597 0.7949 4000 0.5335
0.5909 0.8943 4500 0.5322
0.6009 0.9936 5000 0.5283
0.6086 1.0930 5500 0.5258
0.5812 1.1924 6000 0.5209
0.5868 1.2917 6500 0.5191
0.5689 1.3911 7000 0.5177
0.5777 1.4905 7500 0.5182
0.577 1.5898 8000 0.5169
0.5594 1.6892 8500 0.5150
0.5728 1.7886 9000 0.5144
0.571 1.8879 9500 0.5125
0.5739 1.9873 10000 0.5116
0.5819 2.0866 10500 0.5102
0.5633 2.1860 11000 0.5102
0.5635 2.2854 11500 0.5093
0.5809 2.3847 12000 0.5094
0.5647 2.4841 12500 0.5086
0.5593 2.5835 13000 0.5065
0.5639 2.6828 13500 0.5077
0.5511 2.7822 14000 0.5073
0.5534 2.8816 14500 0.5071
0.5532 2.9809 15000 0.5076

Framework versions

  • Transformers 4.43.1
  • Pytorch 2.4.1+cu121
  • Datasets 3.0.0
  • Tokenizers 0.19.1