HoangHa commited on
Commit
b8ba221
·
verified ·
1 Parent(s): 034de8a

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +97 -0
README.md ADDED
@@ -0,0 +1,97 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - homebrewltd/instruction-speech-whispervq-v2
4
+ language:
5
+ - en
6
+ - vi
7
+ license: apache-2.0
8
+ tags:
9
+ - sound language model
10
+ - audio-text-to-text
11
+ - torchtune
12
+ - whisperspeech
13
+ ---
14
+
15
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/630a5ef0e81e1dea2cedcec0/GCl7IYZaL1NebkdEEyhab.png)
16
+
17
+
18
+ ## Ichigo Whisper
19
+
20
+ Ichigo Whisper is a compact (22M parameters), open-source quantizer for the `Whisper-medium model`, designed to enhance performance on *low-resource languages* with minimal impact on its original English capabilities. Unlike models that output continuous embeddings, Ichigo Whisper compresses speech into discrete tokens, making it more compatible with large language models (LLMs) for immediate speech understanding.
21
+
22
+ This quantized version of Whisper-medium has been trained on over XXX hours of English data and XXX hours of Vietnamese data.
23
+
24
+ Ichigo Whisper is a key component of the Ichigo v0.5 family.
25
+
26
+ For more details, please refer to our official [blog post].
27
+
28
+ ### Model Summary
29
+
30
+ **Developed by:** Homebrew Research.
31
+
32
+ **Model Architecture:** WhisperVQ
33
+
34
+ **Model type:** Quantizer of Whisper
35
+
36
+ **Language(s):** English and Vietnamese
37
+
38
+ **License:** Apache 2.0
39
+
40
+ ### Resources
41
+
42
+ **Demo:** [Ichigo Whisper demo]
43
+
44
+ **Blog:** [Blog post]
45
+
46
+ ## Intended Use
47
+
48
+ **Intended Use Cases** This model is primarily intended for research applications. This version aims to further improve the Whisper on sound low-resource languages.
49
+
50
+ **Out-of-scope** The use of Ichigo Whisper in any manner that violates applicable laws or regulations is strictly prohibited.
51
+
52
+ ## How to Get Started
53
+
54
+ You can use given example code to load the model.
55
+
56
+ ```{python}
57
+
58
+ ```
59
+
60
+ ## Evaluation
61
+
62
+ 1. Vietnamese
63
+
64
+ | Model Name | Codebook Size | Dataset test | Language Test | Test samples | WER |
65
+ |------------|---------------|--------------|---------------|--------------|-----|
66
+ | **IchigoWhisper** | 2561 | viVoice | Vi | 1000 | **11.36** |
67
+ | Whisper Medium | - | viVoice | Vi | 1000 | 18.64 |
68
+
69
+ 2. English
70
+
71
+ | Model Name | Codebook Size | Dataset test | Test samples | WER |
72
+ |------------|---------------|--------------|--------------|-----|
73
+ | **IchigoWhisper** | 2561 | LibriTTS-R | 1000 | **12.96** |
74
+ | Whisper Medium | - | LibriTTS-R | 1000 | 12.99 |
75
+
76
+ ## Citation Information
77
+
78
+ **BibTeX:**
79
+
80
+ ```
81
+ @article{IchigoWhisper 2024,
82
+ title={IchigoWhisper},
83
+ author={Homebrew Research},
84
+ year=2024,
85
+ month=December},
86
+ url={https://huggingface.co/homebrewltd/Ichigo-whisper}
87
+ ```
88
+
89
+ ## Acknowledgement
90
+
91
+ - **[WhisperSpeech](https://github.com/collabora/WhisperSpeech)**
92
+
93
+ - **[Whisper](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct)**
94
+
95
+ - **[Vivoice](https://huggingface.co/datasets/capleaf/viVoice)**
96
+
97
+ - **[LibriTTS]**