metadata

license: mit
base_model: microsoft/speecht5_tts
tags:
  - generated_from_trainer
datasets:
  - common_voice_13_0
model-index:
  - name: speecht5_tts_commonvoice_it_v2
    results: []

speecht5_tts_commonvoice_it_v2

This model is a fine-tuned version of microsoft/speecht5_tts on the common_voice_13_0 dataset. It achieves the following results on the evaluation set:

Loss: 0.5076

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-06
train_batch_size: 32
eval_batch_size: 4
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 1000
num_epochs: 3
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
0.9213	0.0994	500	0.7823
0.8356	0.1987	1000	0.7026
0.6804	0.2981	1500	0.6003
0.6518	0.3975	2000	0.5751
0.6242	0.4968	2500	0.5594
0.6237	0.5962	3000	0.5514
0.6122	0.6955	3500	0.5414
0.597	0.7949	4000	0.5335
0.5909	0.8943	4500	0.5322
0.6009	0.9936	5000	0.5283
0.6086	1.0930	5500	0.5258
0.5812	1.1924	6000	0.5209
0.5868	1.2917	6500	0.5191
0.5689	1.3911	7000	0.5177
0.5777	1.4905	7500	0.5182
0.577	1.5898	8000	0.5169
0.5594	1.6892	8500	0.5150
0.5728	1.7886	9000	0.5144
0.571	1.8879	9500	0.5125
0.5739	1.9873	10000	0.5116
0.5819	2.0866	10500	0.5102
0.5633	2.1860	11000	0.5102
0.5635	2.2854	11500	0.5093
0.5809	2.3847	12000	0.5094
0.5647	2.4841	12500	0.5086
0.5593	2.5835	13000	0.5065
0.5639	2.6828	13500	0.5077
0.5511	2.7822	14000	0.5073
0.5534	2.8816	14500	0.5071
0.5532	2.9809	15000	0.5076

Framework versions

Transformers 4.43.1
Pytorch 2.4.1+cu121
Datasets 3.0.0
Tokenizers 0.19.1