File size: 5,780 Bytes
618a16c 864bac8 618a16c 4a6a8ed daac3db 4a6a8ed f48a184 4a6a8ed 0439302 4a6a8ed e3b844b 4a6a8ed 6bafe60 21e7170 e3b844b daac3db 21e7170 fbba31e e2b4e50 10319fe e3b844b 10319fe e2b4e50 2e24252 9871713 2e24252 9218e57 2e24252 9218e57 2e24252 daac3db 2e24252 9218e57 2e24252 9218e57 2e24252 9218e57 2e24252 55bb9b0 9e17464 2e24252 e3b844b 2e24252 41e5892 2e24252 10319fe 3cd207c 9e17464 4d24040 9e17464 ee030eb |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 |
---
license: apache-2.0
language:
- en
base_model:
- yl4579/StyleTTS2-LJSpeech
pipeline_tag: text-to-speech
---
**Kokoro** is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, Kokoro can be deployed anywhere from production environments to personal projects.
<audio controls><source src="https://huggingface.co/hexgrad/Kokoro-82M/resolve/main/samples/HEARME.wav" type="audio/wav"></audio>
🐈 **GitHub**: https://github.com/hexgrad/kokoro
🚀 **Demo**: https://hf.co/spaces/hexgrad/Kokoro-TTS
🚫 Beware of Fake Websites: Sites like kokorottsai (snapshot at https://archive.ph/nRRnk) are deceptive boilerplates that exploit model popularity to harvest emails. The model uploader does not own any domain containing "kokoro". Be careful who you email.
♻️ This is an Apache-licensed model, and Kokoro has been deployed in numerous projects and commercial APIs. Although we condemn fake websites, we welcome and encourage deployment of the model in real use cases.
- [Releases](#releases)
- [Usage](#usage)
- [EVAL.md](https://huggingface.co/hexgrad/Kokoro-82M/blob/main/EVAL.md) ↗️
- [SAMPLES.md](https://huggingface.co/hexgrad/Kokoro-82M/blob/main/SAMPLES.md) ↗️
- [VOICES.md](https://huggingface.co/hexgrad/Kokoro-82M/blob/main/VOICES.md) ↗️
- [Model Facts](#model-facts)
- [Training Details](#training-details)
- [Creative Commons Attribution](#creative-commons-attribution)
- [Acknowledgements](#acknowledgements)
### Releases
| Model | Published | Training Data | Langs & Voices | SHA256 |
| ----- | --------- | ------------- | -------------- | ------ |
| **v1.0** | **2025 Jan 27** | **Few hundred hrs** | [**8 & 54**](https://huggingface.co/hexgrad/Kokoro-82M/blob/main/VOICES.md) | `496dba11` |
| [v0.19](https://huggingface.co/hexgrad/kLegacy/tree/main/v0.19) | 2024 Dec 25 | <100 hrs | 1 & 10 | `3b0c392f` |
| Training Costs | v0.19 | v1.0 | **Total** |
| -------------- | ----- | ---- | ----- |
| in A100 80GB GPU hours | 500 | 500 | **1000** |
| average hourly rate | $0.80/h | $1.20/h | **$1/h** |
| in USD | $400 | $600 | **$1000** |
### Usage
You can run this basic cell on [Google Colab](https://colab.research.google.com/). [Listen to samples](https://huggingface.co/hexgrad/Kokoro-82M/blob/main/SAMPLES.md). For more languages and details, see [Advanced Usage](https://github.com/hexgrad/kokoro?tab=readme-ov-file#advanced-usage).
```py
!pip install -q kokoro>=0.9.2 soundfile
!apt-get -qq -y install espeak-ng > /dev/null 2>&1
from kokoro import KPipeline
from IPython.display import display, Audio
import soundfile as sf
import torch
pipeline = KPipeline(lang_code='a')
text = '''
[Kokoro](/kˈOkəɹO/) is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, [Kokoro](/kˈOkəɹO/) can be deployed anywhere from production environments to personal projects.
'''
generator = pipeline(text, voice='af_heart')
for i, (gs, ps, audio) in enumerate(generator):
print(i, gs, ps)
display(Audio(data=audio, rate=24000, autoplay=i==0))
sf.write(f'{i}.wav', audio, 24000)
```
Under the hood, `kokoro` uses [`misaki`](https://pypi.org/project/misaki/), a G2P library at https://github.com/hexgrad/misaki
### Model Facts
**Architecture:**
- StyleTTS 2: https://arxiv.org/abs/2306.07691
- ISTFTNet: https://arxiv.org/abs/2203.02395
- Decoder only: no diffusion, no encoder release
**Architected by:** Li et al @ https://github.com/yl4579/StyleTTS2
**Trained by**: `@rzvzn` on Discord
**Languages:** Multiple
**Model SHA256 Hash:** `496dba118d1a58f5f3db2efc88dbdc216e0483fc89fe6e47ee1f2c53f18ad1e4`
### Training Details
**Data:** Kokoro was trained exclusively on **permissive/non-copyrighted audio data** and IPA phoneme labels. Examples of permissive/non-copyrighted audio include:
- Public domain audio
- Audio licensed under Apache, MIT, etc
- Synthetic audio<sup>[1]</sup> generated by closed<sup>[2]</sup> TTS models from large providers<br/>
[1] https://copyright.gov/ai/ai_policy_guidance.pdf<br/>
[2] No synthetic audio from open TTS models or "custom voice clones"
**Total Dataset Size:** A few hundred hours of audio
**Total Training Cost:** About $1000 for 1000 hours of A100 80GB vRAM
### Creative Commons Attribution
The following CC BY audio was part of the dataset used to train Kokoro v1.0.
| Audio Data | Duration Used | License | Added to Training Set After |
| ---------- | ------------- | ------- | --------------------------- |
| [Koniwa](https://github.com/koniwa/koniwa) `tnc` | <1h | [CC BY 3.0](https://creativecommons.org/licenses/by/3.0/deed.ja) | v0.19 / 22 Nov 2024 |
| [SIWIS](https://datashare.ed.ac.uk/handle/10283/2353) | <11h | [CC BY 4.0](https://datashare.ed.ac.uk/bitstream/handle/10283/2353/license_text) | v0.19 / 22 Nov 2024 |
### Acknowledgements
- 🛠️ [@yl4579](https://huggingface.co/yl4579) for architecting StyleTTS 2.
- 🏆 [@Pendrokar](https://huggingface.co/Pendrokar) for adding Kokoro as a contender in the TTS Spaces Arena.
- 📊 Thank you to everyone who contributed synthetic training data.
- ❤️ Special thanks to all compute sponsors.
- 👾 Discord server: https://discord.gg/QuGxSWBfQy
- 🪽 Kokoro is a Japanese word that translates to "heart" or "spirit". Kokoro is also the name of an [AI in the Terminator franchise](https://terminator.fandom.com/wiki/Kokoro).
<img src="https://static0.gamerantimages.com/wordpress/wp-content/uploads/2024/08/terminator-zero-41-1.jpg" width="400" alt="kokoro" />
|