---
language:
- en
- it
license: apache-2.0
base_model: openai/whisper-small
tags:
- hf-asr-leaderboard
- generated_from_trainer
datasets:
- screevoai/code-switch
metrics:
- wer
model-index:
- name: Heero-STT-Model
  results:
  - task:
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: Code-Switch Dataset
      type: screevoai/code-switch
      config: None
      split: None
      args: None
    metrics:
    - name: Wer
      type: wer
      value: 4.446809768789546
---


# Heero-STT-Model

This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) on the screevoai/code-switch dataset.
It achieves the following results on the evaluation set:
- Loss: 0.0895
- Wer: 4.4468

### Training results

| Training Loss | Epoch | Step | Validation Loss | Wer     |
|:-------------:|:-----:|:----:|:---------------:|:-------:|
| 0.0345       | 3  | 1250    | 0.0895         | 4.4468 |

### Libraries to Install

- pip install transformers datasets safetensors librosa huggingface-hub

### Authentication needed before running the script

Run the following command in the terminal/jupyter_notebook:

- Terminal: huggingface-cli login
- Jupyter_notebook:
  
  ```python
  >>> from huggingface_hub import notebook_login
  >>> notebook_login()
  ```

**NOTE:** Copy and Paste the token from your Huggingface Account Settings > Access Tokens > Create a new token / Copy the existing one.


### Script 

```python
>>> from transformers import WhisperProcessor, WhisperForConditionalGeneration
>>> from datasets import load_dataset
>>> import librosa
>>> import requests
>>> from io import BytesIO

>>> # Load model and processor
>>> processor = WhisperProcessor.from_pretrained("screevoai/heero-small-v1")
>>> model = WhisperForConditionalGeneration.from_pretrained("screevoai/heero-small-v1")
>>> model.config.forced_decoder_ids = None

>>> # Load the dataset
>>> ds = load_dataset("screevoai/code-switch", split="test")
>>> sample_url = ds[2]["audio_file_path"]  # change the row number for testing different audio files

>>> # Download the audio file
>>> response = requests.get(sample_url)
>>> audio_file_data = BytesIO(response.content)

>>> # Down-sampling the audio file to 16KHz
>>> audio, sr = librosa.load(audio_file_data, sr=None)
>>> audio_resampled = librosa.resample(audio, orig_sr=sr, target_sr=16000)

>>> processed_audio = processor(audio_resampled, sampling_rate=16000, return_tensors="pt")
>>> input_features = processed_audio['input_features']

>>> # Generate predictions using the model
>>> output_ids = model.generate(input_features, max_new_tokens=400)
>>> transcription = processor.batch_decode(output_ids, skip_special_tokens=True)[0]

>>> print(transcription)

```