whisper-large-v2-cantonese model for CTranslate2
This repository contains the conversion of BELLE-2/Belle-whisper-large-v3-zh-punct to the CTranslate2 model format.
This model can be used in CTranslate2 or projects based on CTranslate2 such as faster-whisper.
Example
Installation
pip install faster-whisper
Usage
from faster_whisper import WhisperModel
import datetime
import os
#Confirmed that this code works in faster-whisper 1.02 , numpy 1.23.5 , onnxruntime 1.14.1
#This code will not work if numpy's version exceed 2.0.0 and vad_filter=True
def transcribe_audio(input_file, output_file):
model_size = "XA9/Belle-faster-whisper-large-v3-zh-punct-int8"
model = WhisperModel(model_size, device="cpu", compute_type="int8")
segments, info = model.transcribe(input_file, word_timestamps=True, initial_prompt = None,
beam_size=5, language="zh", max_new_tokens=128, condition_on_previous_text=False,
vad_filter=False, vad_parameters=dict(min_silence_duration_ms=500))
sub_list = []
srt_content = ""
srt_number = 0
for segment in segments:
print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
start_time_str = format_to_srt(segment.start)
end_time_str = format_to_srt(segment.end)
sub_text = replace_special_chars(segment.text)
sub_entry = f"{start_time_str} --> {end_time_str}\n{sub_text}\n\n"
sub_list.append(sub_entry) # Add formatted subtitles to list
for sub in sub_list: # Add subtitle's index number
srt_content = srt_content + str(srt_number) + "\n" + sub
srt_number = srt_number + 1
with open(output_file, 'w', encoding="utf-8") as srt_file:
srt_file.write(srt_content)
print("")
print("Saved: " + os.path.abspath(output_file))
def replace_special_chars(text): # remove space and "! " if the first letter is space or "! "
# Check if the text starts with "!" or " " and ends with " "
if (text.startswith("! ") or text.startswith(" ")):
# Replace the special characters with an empty string
text = text.replace("!", "").replace(" ", "", 1) # Only replace the first occurrence
return text
def format_to_srt(seconds): #Convert seconds to SRT's timecode
dt = datetime.datetime(1, 1, 1) + datetime.timedelta(seconds=seconds)
formatted_time = "{:02d}:{:02d}:{:02d},{:03d}".format(dt.hour, dt.minute, dt.second, dt.microsecond//1000)
return formatted_time
transcribe_audio("audio.mp3", "audio.srt")
Example(transcribe with stable-ts)
Installation
Requires FFmpeg in PATH
pip install faster-whisper
pip install stable-ts
Usage
import stable_whisper
model = stable_whisper.load_faster_whisper('XA9/Belle-faster-whisper-large-v3-zh-punct-int8', device='cpu', compute_type='int8')
result = model.transcribe_stable('audio.mp3', language='zh', initial_prompt=None,regroup=False, vad=False, condition_on_previous_text=False)
result.to_srt_vtt('audio.srt', word_level=False)
Conversion details
The original model was converted with the following command:
ct2-transformers-converter --model BELLE-2/Belle-whisper-large-v3-zh-punct --output_dir Belle-faster-whisper-large-v3-zh-punct-int8 --copy_files tokenizer.json preprocessor_config.json --quantization int8
- Downloads last month
- 5
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
HF Inference API was unable to determine this model’s pipeline type.