--- language: - en - it license: apache-2.0 base_model: openai/whisper-small tags: - hf-asr-leaderboard - generated_from_trainer datasets: - screevoai/code-switch metrics: - wer model-index: - name: Heero-STT-Model results: - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Code-Switch Dataset type: screevoai/code-switch config: None split: None args: None metrics: - name: Wer type: wer value: 4.446809768789546 --- # Heero-STT-Model This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) on the screevoai/code-switch dataset. It achieves the following results on the evaluation set: - Loss: 0.0895 - Wer: 4.4468 ### Training results | Training Loss | Epoch | Step | Validation Loss | Wer | |:-------------:|:-----:|:----:|:---------------:|:-------:| | 0.0345 | 3 | 1250 | 0.0895 | 4.4468 | ### Libraries to Install - pip install transformers datasets safetensors librosa huggingface-hub ### Authentication needed before running the script Run the following command in the terminal/jupyter_notebook: - Terminal: huggingface-cli login - Jupyter_notebook: ```python >>> from huggingface_hub import notebook_login >>> notebook_login() ``` **NOTE:** Copy and Paste the token from your Huggingface Account Settings > Access Tokens > Create a new token / Copy the existing one. ### Script ```python >>> from transformers import WhisperProcessor, WhisperForConditionalGeneration >>> from datasets import load_dataset >>> import librosa >>> import requests >>> from io import BytesIO >>> # Load model and processor >>> processor = WhisperProcessor.from_pretrained("screevoai/heero-small-v1") >>> model = WhisperForConditionalGeneration.from_pretrained("screevoai/heero-small-v1") >>> model.config.forced_decoder_ids = None >>> # Load the dataset >>> ds = load_dataset("screevoai/code-switch", split="test") >>> sample_url = ds[2]["audio_file_path"] # change the row number for testing different audio files >>> # Download the audio file >>> response = requests.get(sample_url) >>> audio_file_data = BytesIO(response.content) >>> # Down-sampling the audio file to 16KHz >>> audio, sr = librosa.load(audio_file_data, sr=None) >>> audio_resampled = librosa.resample(audio, orig_sr=sr, target_sr=16000) >>> processed_audio = processor(audio_resampled, sampling_rate=16000, return_tensors="pt") >>> input_features = processed_audio['input_features'] >>> # Generate predictions using the model >>> output_ids = model.generate(input_features, max_new_tokens=400) >>> transcription = processor.batch_decode(output_ids, skip_special_tokens=True)[0] >>> print(transcription) ```