---
library_name: transformers
license: mit
base_model: microsoft/speecht5_tts
tags:
- generated_from_trainer
model-index:
- name: speecht5_finetuned_commonvoice_dv
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# speecht5_finetuned_commonvoice_dv

This model is a fine-tuned version of [microsoft/speecht5_tts](https://huggingface.co/microsoft/speecht5_tts) on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 0.4334

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 32
- eval_batch_size: 16
- seed: 42
- gradient_accumulation_steps: 12
- total_train_batch_size: 384
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- training_steps: 4000
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch    | Step | Validation Loss |
|:-------------:|:--------:|:----:|:---------------:|
| 10.3761       | 3.2222   | 100  | 0.7551          |
| 8.3203        | 6.4444   | 200  | 0.5722          |
| 7.3507        | 9.6667   | 300  | 0.5280          |
| 6.9851        | 12.8889  | 400  | 0.5115          |
| 6.6688        | 16.1270  | 500  | 0.4952          |
| 6.479         | 19.3492  | 600  | 0.4871          |
| 6.4798        | 22.5714  | 700  | 0.4771          |
| 6.2714        | 25.7937  | 800  | 0.4759          |
| 6.2132        | 29.0317  | 900  | 0.4700          |
| 6.1966        | 32.2540  | 1000 | 0.4652          |
| 6.1389        | 35.4762  | 1100 | 0.4638          |
| 6.0647        | 38.6984  | 1200 | 0.4603          |
| 6.032         | 41.9206  | 1300 | 0.4602          |
| 6.0107        | 45.1587  | 1400 | 0.4552          |
| 5.9762        | 48.3810  | 1500 | 0.4522          |
| 6.0347        | 51.6032  | 1600 | 0.4507          |
| 5.9424        | 54.8254  | 1700 | 0.4508          |
| 5.9278        | 58.0635  | 1800 | 0.4522          |
| 5.9332        | 61.2857  | 1900 | 0.4473          |
| 5.9201        | 64.5079  | 2000 | 0.4443          |
| 5.8812        | 67.7302  | 2100 | 0.4439          |
| 5.8007        | 70.9524  | 2200 | 0.4426          |
| 5.8262        | 74.1905  | 2300 | 0.4409          |
| 5.8343        | 77.4127  | 2400 | 0.4404          |
| 5.8536        | 80.6349  | 2500 | 0.4408          |
| 5.7672        | 83.8571  | 2600 | 0.4381          |
| 5.757         | 87.0952  | 2700 | 0.4381          |
| 5.7981        | 90.3175  | 2800 | 0.4366          |
| 5.8329        | 93.5397  | 2900 | 0.4371          |
| 5.7738        | 96.7619  | 3000 | 0.4365          |
| 5.7674        | 99.9841  | 3100 | 0.4370          |
| 5.7987        | 103.2222 | 3200 | 0.4356          |
| 5.6883        | 106.4444 | 3300 | 0.4351          |
| 5.7883        | 109.6667 | 3400 | 0.4374          |
| 5.7269        | 112.8889 | 3500 | 0.4345          |
| 5.723         | 116.1270 | 3600 | 0.4336          |
| 5.7776        | 119.3492 | 3700 | 0.4354          |
| 5.724         | 122.5714 | 3800 | 0.4342          |
| 5.7235        | 125.7937 | 3900 | 0.4334          |
| 5.7067        | 129.0317 | 4000 | 0.4334          |


### Framework versions

- Transformers 4.48.0.dev0
- Pytorch 2.5.1+cu121
- Datasets 3.2.0
- Tokenizers 0.21.0