--- license: cc-by-nc-sa-4.0 --- # Cool-Whisper ### [Leave No Knowledge Behind During Knowledge Distillation: Towards Practical and Effective Knowledge Distillation for Code-Switching ASR Using Realistic Data](https://arxiv.org/abs/2407.10603) Liang-Hsuan Tseng, Zih-Ching Chen, Wei-Shun Chang, Cheng-Kuang Lee, Tsung-Ren Huang, Hung-yi Lee [![arXiv](https://img.shields.io/badge/arXiv-Paper-color.svg)](https://arxiv.org/abs/2407.10603) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/#fileId=https://huggingface.co/andybi7676/cool-whisper/blob/main/Cool_Whisper_Colab_Demo_(via_faster-whisper).ipynb) > ⚠️ Due to privacy and security concerns, this model will be temporarily taken offline. We are sorry for the inconvenience. > ⚠️ 因為隱私安全疑慮,本模型將暫時下架。非常抱歉造成大家困擾。 ## Introduction * Cool-whisper is a distilled version of Whisper, mainly focused on **Mandarin-English** code-switching ASR for people in Taiwan. * We use 60,000 hours of **unlabeled** audio to train the model. * Practically, we utilize *knowledge* not only from the large model (Whisper-large-v2) but also from the small model (Whisper-base). ## Basic Usage * This model repository is in the form of [CTranslate2](https://github.com/OpenNMT/CTranslate2/) and is compatible with [faster-whisper](https://github.com/SYSTRAN/faster-whisper). * Using faster-whisper can lead to about **3~5 times faster** generation speed than the original implementation from [OpenAI](https://github.com/openai/whisper). * If you prefer using the model through Hugging Face `transformers`, please visit https://huggingface.co/andybi7676/cool-whisper-hf ```python from faster_whisper import WhisperModel import soundfile as sf model_card = "andybi7676/cool-whisper" audio_fpath = "/your/path/to/audio.wav" audio_info = sf.info(audio_fpath) print(audio_info) # for debug model = WhisperModel(model_card, device="cuda", compute_type="float16") segments, info = model.transcribe(audio_fpath, beam_size=5, language="zh", condition_on_previous_text=True) # zh for zh-en code-switching in cool-whisper for segment in segments: print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text)) ```