speecht5_finetuned_commonvoice_dv

This model is a fine-tuned version of microsoft/speecht5_tts on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 32
eval_batch_size: 16
seed: 42
gradient_accumulation_steps: 12
total_train_batch_size: 384
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
training_steps: 4000
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
10.3761	3.2222	100	0.7551
8.3203	6.4444	200	0.5722
7.3507	9.6667	300	0.5280
6.9851	12.8889	400	0.5115
6.6688	16.1270	500	0.4952
6.479	19.3492	600	0.4871
6.4798	22.5714	700	0.4771
6.2714	25.7937	800	0.4759
6.2132	29.0317	900	0.4700
6.1966	32.2540	1000	0.4652
6.1389	35.4762	1100	0.4638
6.0647	38.6984	1200	0.4603
6.032	41.9206	1300	0.4602
6.0107	45.1587	1400	0.4552
5.9762	48.3810	1500	0.4522
6.0347	51.6032	1600	0.4507
5.9424	54.8254	1700	0.4508
5.9278	58.0635	1800	0.4522
5.9332	61.2857	1900	0.4473
5.9201	64.5079	2000	0.4443
5.8812	67.7302	2100	0.4439
5.8007	70.9524	2200	0.4426
5.8262	74.1905	2300	0.4409
5.8343	77.4127	2400	0.4404
5.8536	80.6349	2500	0.4408
5.7672	83.8571	2600	0.4381
5.757	87.0952	2700	0.4381
5.7981	90.3175	2800	0.4366
5.8329	93.5397	2900	0.4371
5.7738	96.7619	3000	0.4365
5.7674	99.9841	3100	0.4370
5.7987	103.2222	3200	0.4356
5.6883	106.4444	3300	0.4351
5.7883	109.6667	3400	0.4374
5.7269	112.8889	3500	0.4345
5.723	116.1270	3600	0.4336
5.7776	119.3492	3700	0.4354
5.724	122.5714	3800	0.4342
5.7235	125.7937	3900	0.4334
5.7067	129.0317	4000	0.4334