# Whisper v3 is here!

Whisper v3 is a new model open sourced by OpenAI. The model can do multilingual transcriptions and is quite impressive. For example, you can change from English to Spanish or Chinese in the middle of a sentence and it will work well!

The model can be run in a free Google Colab instance and is integrated into `transformers` already, so switching can be a very smooth process if you already use the previous versions.

In [1]:
%%capture
!pip install git+https://github.com/huggingface/transformers gradio

Collecting git+https://github.com/huggingface/transformers
  Cloning https://github.com/huggingface/transformers to c:\users\blu-ray\appdata\local\temp\pip-req-build-jqimnzmp
  Resolved https://github.com/huggingface/transformers to commit 816f4424964c1a1631e303b663fc3d68f731e923
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Building wheels for collected packages: transformers
  Building wheel for transformers (pyproject.toml): started
  Building wheel for transformers (pyproject.toml): finished with status 'done'
  Created wheel for transformers: filename=transformers-4.46.0.dev0-py3-none-any.whl size=9991917 sha256=ad63aaf442d2aa5151b0780d1a1d4deab93c5606a8a1f3b83a4f860e16f6820f
  Stored in directory: C:\

  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/transformers 'C:\Users\BLU-RAY\AppData\Local\Temp\pip-req-build-jqimnzmp'


Let's use the high level `pipeline` from the `transformers` library to load the model.

In [None]:
import torch
from transformers import pipeline,MarianMTModel, MarianTokenizer

pipe = pipeline("automatic-speech-recognition",
               "openai/whisper-large-v3",
               torch_dtype=torch.float16,
               device="cuda:0")

model_name_translate = "Helsinki-NLP/opus-mt-en-ar"
tokenizer_translation = MarianTokenizer.from_pretrained(model_name_translate)
model_translate = MarianMTModel.from_pretrained(model_name_translate)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [None]:
pipe("https://cdn-media.huggingface.co/speech_samples/sample1.flac")

{'text': " going along slushy country roads and speaking to damp audiences in draughty schoolrooms day after day for a fortnight he'll have to put in an appearance at some place of worship on sunday morning and he can come to us immediately afterwards"}

Let's now build a quick Gradio demo where we can play with the model directly using our microphone! You can run this code in a Google Colab instance (or locally!) or just head to the <a href="https://huggingface.co/spaces/hf-audio/whisper-large-v3" target="_blank">Space</a> to play directly with it online.

In [4]:
import gradio as gr

def translate(sentence):
    batch = tokenizer_translation([sentence], return_tensors="pt")
    generated_ids = model_translate.generate(batch["input_ids"])
    text  = tokenizer_translation.batch_decode(generated_ids, skip_special_tokens=True)[0]
    return text

def transcribe(inputs):
    if inputs is None:
        raise gr.Error("No audio file submitted! Please record an audio before submitting your request.")

    text = pipe(inputs, generate_kwargs={"task": "transcribe"}, return_timestamps=True)["text"]
    text = translate({"text": text})
    return text

demo = gr.Interface(
    fn=transcribe,
    inputs=[
        gr.Audio(sources=["microphone", "upload"], type="filepath"),
    ],
    outputs="text",
    title="Whisper Large V3: Transcribe Audio",
    description=(
        "Transcribe long-form microphone or audio inputs with the click of a button! Demo uses the"
        " checkpoint [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) and ðŸ¤— Transformers to transcribe audio files"
        " of arbitrary length."
    ),
    allow_flagging="never",
)

demo.launch()


Running on local URL:  http://127.0.0.1:7862

To create a public link, set `share=True` in `launch()`.


