--- license: mit --- # whisper-large-v2-cantonese model for CTranslate2 This repository contains the conversion of [BELLE-2/Belle-whisper-large-v3-zh-punct](https://huggingface.co/BELLE-2/Belle-whisper-large-v3-zh-punct) to the [CTranslate2](https://github.com/OpenNMT/CTranslate2) model format. This model can be used in CTranslate2 or projects based on CTranslate2 such as [faster-whisper](https://github.com/guillaumekln/faster-whisper). ## Example Installation ```python pip install faster-whisper ``` Usage ```python from faster_whisper import WhisperModel import datetime import os #Confirmed that this code works in faster-whisper 1.02 , numpy 1.23.5 , onnxruntime 1.14.1 #This code will not work if numpy's version exceed 2.0.0 and vad_filter=True def transcribe_audio(input_file, output_file): model_size = "XA9/Belle-faster-whisper-large-v3-zh-punct-int8" model = WhisperModel(model_size, device="cpu", compute_type="int8") segments, info = model.transcribe(input_file, word_timestamps=True, initial_prompt = None, beam_size=5, language="zh", max_new_tokens=128, condition_on_previous_text=False, vad_filter=False, vad_parameters=dict(min_silence_duration_ms=500)) sub_list = [] srt_content = "" srt_number = 0 for segment in segments: print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text)) start_time_str = format_to_srt(segment.start) end_time_str = format_to_srt(segment.end) sub_text = replace_special_chars(segment.text) sub_entry = f"{start_time_str} --> {end_time_str}\n{sub_text}\n\n" sub_list.append(sub_entry) # Add formatted subtitles to list for sub in sub_list: # Add subtitle's index number srt_content = srt_content + str(srt_number) + "\n" + sub srt_number = srt_number + 1 with open(output_file, 'w', encoding="utf-8") as srt_file: srt_file.write(srt_content) print("") print("Saved: " + os.path.abspath(output_file)) def replace_special_chars(text): # remove space and "! " if the first letter is space or "! " # Check if the text starts with "!" or " " and ends with " " if (text.startswith("! ") or text.startswith(" ")): # Replace the special characters with an empty string text = text.replace("!", "").replace(" ", "", 1) # Only replace the first occurrence return text def format_to_srt(seconds): #Convert seconds to SRT's timecode dt = datetime.datetime(1, 1, 1) + datetime.timedelta(seconds=seconds) formatted_time = "{:02d}:{:02d}:{:02d},{:03d}".format(dt.hour, dt.minute, dt.second, dt.microsecond//1000) return formatted_time transcribe_audio("audio.mp3", "audio.srt") ``` ## Example(transcribe with stable-ts) Installation Requires FFmpeg in PATH ```python pip install faster-whisper pip install stable-ts ``` Usage ```python import stable_whisper model = stable_whisper.load_faster_whisper('XA9/Belle-faster-whisper-large-v3-zh-punct-int8', device='cpu', compute_type='int8') result = model.transcribe_stable('audio.mp3', language='zh', initial_prompt=None,regroup=False, vad=False, condition_on_previous_text=False) result.to_srt_vtt('audio.srt', word_level=False) ``` ## Conversion details The original model was converted with the following command: ``` ct2-transformers-converter --model BELLE-2/Belle-whisper-large-v3-zh-punct --output_dir Belle-faster-whisper-large-v3-zh-punct-int8 --copy_files tokenizer.json preprocessor_config.json --quantization int8 ```