--- license: mit datasets: - mozilla-foundation/common_voice_17_0 language: - en - ta metrics: - wer base_model: - openai/whisper-small pipeline_tag: automatic-speech-recognition library_name: transformers tags: - language-identification - speech-to-text --- # Whisper-small-ta This model is trainned for voice to text trancription for tamil language ## Model Overview This model is fine-tuned from `openai/whisper-small` using the [Mozilla Common Voice 17.0](https://huggingface.co/datasets/mozilla-foundation/common_voice_17_0) dataset for language identification and transcription in **Tamil** . The model is designed to accurately transcribe spoken audio into text and identify whether the language is Tamil . ### Key Features: - **Languages**: Tamil - **Base Model**: Whisper-small from OpenAI - **Dataset**: Mozilla Common Voice 17.0 ## Intended Use The model is designed for automatic speech recognition (ASR) in Tamil, making it suitable for transcription and language identification in real-time applications. ## Training Details This model was fine-tuned using a subset of the Mozilla Common Voice dataset. The dataset contains '53,468 ' samples ### Fine-tuning Process: - The fine-tuning was performed on `Whisper-small`, a smaller version of OpenAI's Whisper model, for reduced latency and higher accuracy for low-resource languages. - The model was trained for `2` epochs on a `Google Colab Pro` environment. ## Performance The model achieved a **Word Error Rate (WER)** of `34%` , using a validation dataset with `8` hours of audio. We expect further improvements with continued training. ## Usage You can use this model with the following code: ```python from transformers import WhisperForConditionalGeneration, WhisperProcessor import torch model = WhisperForConditionalGeneration.from_pretrained("Lingalingeswaran/whisper-small-ta") processor = WhisperProcessor.from_pretrained("Lingalingeswaran/whisper-small-ta") # Example audio input audio = "path_to_audio_file" inputs = processor(audio, return_tensors="pt", padding="longest") with torch.no_grad(): predicted_ids = model.generate(inputs.input_ids) transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True) print(transcription)