T5-based Audio Transcription Fusion Model

This model combines transcriptions from multiple sources separated by '/' to generate an optimal transcription. It is fine-tuned on a dataset where each sample has three candidate transcriptions and a reference transcription.

Training Details

Model trained on 21000 samples for 10 epochs with T5-small as the base model.

Training Loss: 0.008486982434988022

Evaluation Details

Test Loss: 0.012162444764789124 Word Error Rate (WER): 0.1033691040678812

Downloads last month: 4

Safetensors

Model size

60.5M params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support