Upload 2 files
Browse files
README.md
CHANGED
@@ -14,16 +14,25 @@ pipeline_tag: text-to-speech
|
|
14 |
|
15 |
❤️ Kokoro Discord Server: https://discord.gg/QuGxSWBfQy
|
16 |
|
17 |
-
**Kokoro** is
|
18 |
|
19 |
-
- [Usage](https://huggingface.co/hexgrad/Kokoro-82M#usage)
|
20 |
- [Releases](https://huggingface.co/hexgrad/Kokoro-82M#releases)
|
|
|
21 |
- [Voices and Languages](https://huggingface.co/hexgrad/Kokoro-82M#voices-and-languages)
|
22 |
- [Model Facts](https://huggingface.co/hexgrad/Kokoro-82M#model-facts)
|
23 |
- [Training Details](https://huggingface.co/hexgrad/Kokoro-82M#training-details)
|
24 |
- [Creative Commons Attribution](https://huggingface.co/hexgrad/Kokoro-82M#creative-commons-attribution)
|
25 |
- [Acknowledgements](https://huggingface.co/hexgrad/Kokoro-82M#acknowledgements)
|
26 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
27 |
### Usage
|
28 |
|
29 |
[`pip install kokoro`](https://pypi.org/project/kokoro/) installs the inference library at https://github.com/hexgrad/kokoro
|
@@ -31,22 +40,21 @@ pipeline_tag: text-to-speech
|
|
31 |
You can run this cell on [Google Colab](https://colab.research.google.com/).
|
32 |
```py
|
33 |
# 1️⃣ Install kokoro
|
34 |
-
!pip install -q kokoro>=0.
|
35 |
-
# 2️⃣ Install espeak, used for
|
36 |
!apt-get -qq -y install espeak-ng > /dev/null 2>&1
|
37 |
-
# You can skip espeak installation, but OOD words will be skipped unless you provide a fallback
|
38 |
|
39 |
# 3️⃣ Initalize a pipeline
|
40 |
from kokoro import KPipeline
|
41 |
from IPython.display import display, Audio
|
42 |
import soundfile as sf
|
43 |
-
# 🇺🇸 'a' => American English
|
44 |
-
#
|
45 |
-
#
|
46 |
-
#
|
47 |
-
pipeline = KPipeline(lang_code='a') # make sure lang_code matches voice
|
48 |
|
49 |
-
#
|
50 |
text = '''
|
51 |
The sky above the port was the color of television, tuned to a dead channel.
|
52 |
"It's not like I'm using," Case heard someone say, as he shouldered his way through the crowd around the door of the Chat. "It's like my body's developed this massive drug deficiency."
|
@@ -56,6 +64,8 @@ These were to have an enormous impact, not only because they were associated wit
|
|
56 |
'''
|
57 |
# text = 'Le dromadaire resplendissant déambulait tranquillement dans les méandres en mastiquant de petites feuilles vernissées.'
|
58 |
# text = 'ट्रांसपोर्टरों की हड़ताल लगातार पांचवें दिन जारी, दिसंबर से इलेक्ट्रॉनिक टोल कलेक्शनल सिस्टम'
|
|
|
|
|
59 |
|
60 |
# 4️⃣ Generate, display, and save audio files in a loop.
|
61 |
generator = pipeline(
|
@@ -72,24 +82,13 @@ for i, (gs, ps, audio) in enumerate(generator):
|
|
72 |
|
73 |
Under the hood, `kokoro` uses [`misaki`](https://pypi.org/project/misaki/), a G2P library at https://github.com/hexgrad/misaki
|
74 |
|
75 |
-
### Releases
|
76 |
-
|
77 |
-
| Model | Published | Training Data | Compute (A100 80GB) | Released Langs & Voices | SHA256 |
|
78 |
-
| ----- | --------- | ------------- | ------------------- | ----------------------- | ------ |
|
79 |
-
| **v1.0** | 2025 Jan 27 | Few hundred hrs | $1000 for 1000 hrs | [3 & 31](https://huggingface.co/hexgrad/Kokoro-82M/blob/main/VOICES.md) | `496dba11` |
|
80 |
-
| [v0.19](https://huggingface.co/hexgrad/kLegacy/tree/main/v0.19) | 2024 Dec 25 | <100 hrs | $400 for 500 hrs | 1 & 10 | `3b0c392f` |
|
81 |
-
|
82 |
-
Training is continuous, so the compute footprints overlap.
|
83 |
-
|
84 |
-
v0.19 is now deprecated. You can access old v0.19 files [here](https://huggingface.co/hexgrad/kLegacy/tree/main/v0.19).
|
85 |
-
|
86 |
### Voices and Languages
|
87 |
|
88 |
Voices are listed in [VOICES.md](https://huggingface.co/hexgrad/Kokoro-82M/blob/main/VOICES.md). Not all voices are created equal:
|
89 |
- Subjectively, voices will sound better or worse to different people.
|
90 |
-
-
|
91 |
-
-
|
92 |
-
-
|
93 |
|
94 |
Support for non-English languages may be absent or thin due to weak G2P and/or lack of training data. Some languages are only represented by a small handful or even just one voice (French).
|
95 |
|
@@ -138,7 +137,7 @@ The following CC BY audio was part of the dataset used to train Kokoro v1.0.
|
|
138 |
- [@yl4579](https://huggingface.co/yl4579) for architecting StyleTTS 2.
|
139 |
- [@Pendrokar](https://huggingface.co/Pendrokar) for adding Kokoro as a contender in the TTS Spaces Arena.
|
140 |
- Thank you to everyone who contributed synthetic training data.
|
141 |
-
- Special thanks to
|
142 |
- Kokoro is a Japanese word that translates to "heart" or "spirit". Kokoro is also the name of an [AI in the Terminator franchise](https://terminator.fandom.com/wiki/Kokoro).
|
143 |
|
144 |
<img src="https://static0.gamerantimages.com/wordpress/wp-content/uploads/2024/08/terminator-zero-41-1.jpg" width="400" alt="kokoro" />
|
|
|
14 |
|
15 |
❤️ Kokoro Discord Server: https://discord.gg/QuGxSWBfQy
|
16 |
|
17 |
+
**Kokoro** is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, Kokoro can be deployed anywhere from production environments to personal projects.
|
18 |
|
|
|
19 |
- [Releases](https://huggingface.co/hexgrad/Kokoro-82M#releases)
|
20 |
+
- [Usage](https://huggingface.co/hexgrad/Kokoro-82M#usage)
|
21 |
- [Voices and Languages](https://huggingface.co/hexgrad/Kokoro-82M#voices-and-languages)
|
22 |
- [Model Facts](https://huggingface.co/hexgrad/Kokoro-82M#model-facts)
|
23 |
- [Training Details](https://huggingface.co/hexgrad/Kokoro-82M#training-details)
|
24 |
- [Creative Commons Attribution](https://huggingface.co/hexgrad/Kokoro-82M#creative-commons-attribution)
|
25 |
- [Acknowledgements](https://huggingface.co/hexgrad/Kokoro-82M#acknowledgements)
|
26 |
|
27 |
+
### Releases
|
28 |
+
|
29 |
+
| Model | Published | Training Data | Compute (A100 80GB) | Released Langs & Voices | SHA256 |
|
30 |
+
| ----- | --------- | ------------- | ------------------- | ----------------------- | ------ |
|
31 |
+
| **v1.0** | 2025 Jan 27 | Few hundred hrs | $1000 for 1000 hrs | [5 & 40](https://huggingface.co/hexgrad/Kokoro-82M/blob/main/VOICES.md) | `496dba11` |
|
32 |
+
| [v0.19](https://huggingface.co/hexgrad/kLegacy/tree/main/v0.19) | 2024 Dec 25 | <100 hrs | $400 for 500 hrs | 1 & 10 | `3b0c392f` |
|
33 |
+
|
34 |
+
v0.19 has been deprecated. You can access old v0.19 files [here](https://huggingface.co/hexgrad/kLegacy/tree/main/v0.19).
|
35 |
+
|
36 |
### Usage
|
37 |
|
38 |
[`pip install kokoro`](https://pypi.org/project/kokoro/) installs the inference library at https://github.com/hexgrad/kokoro
|
|
|
40 |
You can run this cell on [Google Colab](https://colab.research.google.com/).
|
41 |
```py
|
42 |
# 1️⃣ Install kokoro
|
43 |
+
!pip install -q kokoro>=0.3.1 soundfile
|
44 |
+
# 2️⃣ Install espeak, used for English OOD fallback and some non-English languages
|
45 |
!apt-get -qq -y install espeak-ng > /dev/null 2>&1
|
|
|
46 |
|
47 |
# 3️⃣ Initalize a pipeline
|
48 |
from kokoro import KPipeline
|
49 |
from IPython.display import display, Audio
|
50 |
import soundfile as sf
|
51 |
+
# 🇺🇸 'a' => American English, 🇬🇧 'b' => British English
|
52 |
+
# 🇫🇷 'f' => French, 🇮🇳 'h' => Hindi: apt-get install espeak-ng
|
53 |
+
# 🇯🇵 'j' => Japanese: pip install misaki[ja]
|
54 |
+
# 🇨🇳 'z' => Mandarin Chinese: pip install misaki[zh]
|
55 |
+
pipeline = KPipeline(lang_code='a') # <= make sure lang_code matches voice
|
56 |
|
57 |
+
# This text is for demonstration purposes only, unseen during training
|
58 |
text = '''
|
59 |
The sky above the port was the color of television, tuned to a dead channel.
|
60 |
"It's not like I'm using," Case heard someone say, as he shouldered his way through the crowd around the door of the Chat. "It's like my body's developed this massive drug deficiency."
|
|
|
64 |
'''
|
65 |
# text = 'Le dromadaire resplendissant déambulait tranquillement dans les méandres en mastiquant de petites feuilles vernissées.'
|
66 |
# text = 'ट्रांसपोर्टरों की हड़ताल लगातार पांचवें दिन जारी, दिसंबर से इलेक्ट्रॉनिक टोल कलेक्शनल सिस्टम'
|
67 |
+
# text = '「もしおれがただ偶然、そしてこうしようというつもりでなくここに立っているのなら、ちょっとばかり絶望するところだな」と、そんなことが彼の頭に思い浮かんだ。'
|
68 |
+
# text = '中國人民不信邪也不怕邪,不惹事也不怕事,任何外國不要指望我們會拿自己的核心利益做交易,不要指望我們會吞下損害我國主權、安全、發展利益的苦果!'
|
69 |
|
70 |
# 4️⃣ Generate, display, and save audio files in a loop.
|
71 |
generator = pipeline(
|
|
|
82 |
|
83 |
Under the hood, `kokoro` uses [`misaki`](https://pypi.org/project/misaki/), a G2P library at https://github.com/hexgrad/misaki
|
84 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
85 |
### Voices and Languages
|
86 |
|
87 |
Voices are listed in [VOICES.md](https://huggingface.co/hexgrad/Kokoro-82M/blob/main/VOICES.md). Not all voices are created equal:
|
88 |
- Subjectively, voices will sound better or worse to different people.
|
89 |
+
- Less training data for a given voice (minutes instead of hours) => worse inference quality.
|
90 |
+
- Poor audio quality in training data (compression, sample rate, artifacts) => worse inference quality.
|
91 |
+
- Text-audio misalignment alignment (too much text i.e. hallucinations, or not enough text i.e. failed transcriptions) => worse inference quality.
|
92 |
|
93 |
Support for non-English languages may be absent or thin due to weak G2P and/or lack of training data. Some languages are only represented by a small handful or even just one voice (French).
|
94 |
|
|
|
137 |
- [@yl4579](https://huggingface.co/yl4579) for architecting StyleTTS 2.
|
138 |
- [@Pendrokar](https://huggingface.co/Pendrokar) for adding Kokoro as a contender in the TTS Spaces Arena.
|
139 |
- Thank you to everyone who contributed synthetic training data.
|
140 |
+
- Special thanks to all compute sponsors. ❤️
|
141 |
- Kokoro is a Japanese word that translates to "heart" or "spirit". Kokoro is also the name of an [AI in the Terminator franchise](https://terminator.fandom.com/wiki/Kokoro).
|
142 |
|
143 |
<img src="https://static0.gamerantimages.com/wordpress/wp-content/uploads/2024/08/terminator-zero-41-1.jpg" width="400" alt="kokoro" />
|
VOICES.md
CHANGED
@@ -73,3 +73,26 @@ Hindi G2P: espeak-ng `hi`
|
|
73 |
| hm_psi | 🚹 | B | MM minutes | C |
|
74 |
|
75 |
This table lists all Hindi training data seen by Kokoro, which totals about 6 hours.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
73 |
| hm_psi | 🚹 | B | MM minutes | C |
|
74 |
|
75 |
This table lists all Hindi training data seen by Kokoro, which totals about 6 hours.
|
76 |
+
|
77 |
+
### Japanese 🇯🇵
|
78 |
+
|
79 |
+
Japanese G2P: [`misaki[ja]`](https://github.com/hexgrad/misaki)
|
80 |
+
|
81 |
+
| Name | Traits | Target Quality | Training Duration | Overall Grade |
|
82 |
+
| ---- | ------ | -------------- | ----------------- | ------------- |
|
83 |
+
| jf_alpha | 🚺 | B | H hours | C+ |
|
84 |
+
|
85 |
+
### Mandarin Chinese 🇨🇳
|
86 |
+
|
87 |
+
Mandarin Chinese G2P: [`misaki[zh]`](https://github.com/hexgrad/misaki)
|
88 |
+
|
89 |
+
| Name | Traits | Target Quality | Training Duration | Overall Grade |
|
90 |
+
| ---- | ------ | -------------- | ----------------- | ------------- |
|
91 |
+
| zf_xiaobei | 🚺 | C | MM minutes | D |
|
92 |
+
| zf_xiaoni | 🚺 | C | MM minutes | D |
|
93 |
+
| zf_xiaoxiao | 🚺 | C | MM minutes | D |
|
94 |
+
| zf_xiaoyi | 🚺 | C | MM minutes | D |
|
95 |
+
| zm_yunjian | 🚹 | C | MM minutes | D |
|
96 |
+
| zm_yunxi | 🚹 | C | MM minutes | D |
|
97 |
+
| zm_yunxia | 🚹 | C | MM minutes | D |
|
98 |
+
| zm_yunyang | 🚹 | C | MM minutes | D |
|