whisper-large-v2-cantonese model for CTranslate2

This repository contains the conversion of BELLE-2/Belle-whisper-large-v3-zh-punct to the CTranslate2 model format.

This model can be used in CTranslate2 or projects based on CTranslate2 such as faster-whisper.

Example

Installation

pip install faster-whisper

Usage

from faster_whisper import WhisperModel
import datetime
import os

#Confirmed that this code works in faster-whisper 1.02 , numpy 1.23.5 , onnxruntime 1.14.1
#This code will not work if numpy's version exceed 2.0.0 and vad_filter=True

def transcribe_audio(input_file, output_file):
    model_size = "XA9/Belle-faster-whisper-large-v3-zh-punct-int8"
    model = WhisperModel(model_size, device="cpu", compute_type="int8")
    segments, info = model.transcribe(input_file, word_timestamps=True, initial_prompt = None,
        beam_size=5, language="zh", max_new_tokens=128, condition_on_previous_text=False,
        vad_filter=False, vad_parameters=dict(min_silence_duration_ms=500))
        
    sub_list = [] 
    srt_content = ""
    srt_number = 0
    for segment in segments:
        print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
        start_time_str = format_to_srt(segment.start)
        end_time_str = format_to_srt(segment.end)
        sub_text = replace_special_chars(segment.text)
        sub_entry = f"{start_time_str} --> {end_time_str}\n{sub_text}\n\n"
        sub_list.append(sub_entry)  # Add formatted subtitles to list
    
    for sub in sub_list: # Add subtitle's index number
        srt_content = srt_content + str(srt_number) + "\n" + sub
        srt_number = srt_number + 1
        
    with open(output_file, 'w', encoding="utf-8") as srt_file:
        srt_file.write(srt_content)
    
    print("")
    print("Saved: " + os.path.abspath(output_file))

def replace_special_chars(text): # remove space and "! " if the first letter is space or "! "
    # Check if the text starts with "!" or " " and ends with " "
    if (text.startswith("! ") or text.startswith(" ")):
        # Replace the special characters with an empty string
        text = text.replace("!", "").replace(" ", "", 1)  # Only replace the first occurrence
    return text


def format_to_srt(seconds): #Convert seconds to SRT's timecode
    dt = datetime.datetime(1, 1, 1) + datetime.timedelta(seconds=seconds)
    formatted_time = "{:02d}:{:02d}:{:02d},{:03d}".format(dt.hour, dt.minute, dt.second, dt.microsecond//1000)
    return formatted_time


transcribe_audio("audio.mp3", "audio.srt")

Example(transcribe with stable-ts)

Installation

Requires FFmpeg in PATH

pip install faster-whisper
pip install stable-ts

Usage

import stable_whisper

model = stable_whisper.load_faster_whisper('XA9/Belle-faster-whisper-large-v3-zh-punct-int8', device='cpu', compute_type='int8')
result = model.transcribe_stable('audio.mp3', language='zh', initial_prompt=None,regroup=False, vad=False, condition_on_previous_text=False)
result.to_srt_vtt('audio.srt', word_level=False)

Conversion details

The original model was converted with the following command:

ct2-transformers-converter --model BELLE-2/Belle-whisper-large-v3-zh-punct --output_dir Belle-faster-whisper-large-v3-zh-punct-int8 --copy_files tokenizer.json preprocessor_config.json --quantization int8