File size: 7,215 Bytes

---
language:
- ja
base_model:
- parler-tts/parler-tts-mini-v1
- retrieva-jp/t5-base-long
pipeline_tag: text-to-speech
library_name: transformers
tags:
- text-to-speech
- annotation
- japanese
license: other
---



# Japanese Parler-TTS Mini 

このリポジトリは、[parler-tts/parler-tts-mini-v1](https://huggingface.co/parler-tts/parler-tts-mini-v1)を基に、日本語でのテキスト読み上げを可能にするよう再学習したモデルを公開しています。本モデルは、軽量でありながら高品質な音声生成を提供します。

**注意**: 本家の[Parler-TTS](https://huggingface.co/collections/parler-tts/parler-tts-fully-open-source-high-quality-tts-66164ad285ba03e8ffde214c)で使用されているtokenizerとは互換性がありません。本モデル用に独自のtokenizerが採用されています。


---


## Japanese Parler-TTS Index

- [Japanese Parler-TTS Mini](https://huggingface.co/2121-8/japanese-parler-tts-mini)
- Japanese Parler-TTS Large (計算資源に余裕があったら学習します)


---


## 📖 クイックインデックス
* [👨‍💻 インストール](#👨‍💻-インストール)
* [🎲 ランダムな音声での使用方法](#🎲-ランダムな音声での使用方法)
* [🎯 特定の話者を指定する方法](#🎯-特定の話者を指定する方法)

---

## 🛠️ 使用方法

### 👨‍💻 インストール

以下のコマンドでインストールできます。

```sh
pip install git+https://github.com/huggingface/parler-tts.git
pip install git+https://github.com/getuka/RubyInserter.git
```

---

### 🎲 ランダムな音声での使用方法

```python
import torch
from parler_tts import ParlerTTSForConditionalGeneration
from transformers import AutoTokenizer
import soundfile as sf
from rubyinserter import add_ruby

device = "cuda:0" if torch.cuda.is_available() else "cpu"

model = ParlerTTSForConditionalGeneration.from_pretrained("2121-8/japanese-parler-tts-mini").to(device)
prompt_tokenizer = AutoTokenizer.from_pretrained("2121-8/japanese-parler-tts-mini", subfolder="prompt_tokenizer")
description_tokenizer = AutoTokenizer.from_pretrained("2121-8/japanese-parler-tts-mini", subfolder="description_tokenizer")

prompt = "こんにちは、今日はどのようにお過ごしですか？"
description = "A female speaker with a slightly high-pitched voice delivers her words at a moderate speed with a quite monotone tone in a confined environment, resulting in a quite clear audio recording."


prompt = add_ruby(prompt)
input_ids = description_tokenizer(description, return_tensors="pt").input_ids.to(device)
prompt_input_ids = prompt_tokenizer(prompt, return_tensors="pt").input_ids.to(device)

generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
audio_arr = generation.cpu().numpy().squeeze()
sf.write("parler_tts_japanese_out.wav", audio_arr, model.config.sampling_rate)
```


### サンプル音声

<audio controls>
  <source src="https://huggingface.co/2121-8/japanese-parler-tts-mini/resolve/main/audio/normal_sample_1.wav" type="audio/wav">
  お使いのブラウザはオーディオタグをサポートしていません。
</audio>
</br>
<audio controls>
  <source src="https://huggingface.co/2121-8/japanese-parler-tts-mini/resolve/main/audio/normal_sample_2.wav" type="audio/wav">
  お使いのブラウザはオーディオタグをサポートしていません。
</audio>
</br>
<audio controls>
  <source src="https://huggingface.co/2121-8/japanese-parler-tts-mini/resolve/main/audio/normal_sample_3.wav" type="audio/wav">
  お使いのブラウザはオーディオタグをサポートしていません。
</audio>

---

### 🎯 特定の話者を指定する方法
使用した学習データ: [JSUT](https://sites.google.com/site/shinnosuketakamichi/publication/jsut)

```python
import torch
from parler_tts import ParlerTTSForConditionalGeneration
from transformers import AutoTokenizer
import soundfile as sf
from rubyinserter import add_ruby

device = "cuda:0" if torch.cuda.is_available() else "cpu"

model = ParlerTTSForConditionalGeneration.from_pretrained("2121-8/japanese-parler-tts-mini").to(device)
prompt_tokenizer = AutoTokenizer.from_pretrained("2121-8/japanese-parler-tts-mini", subfolder="prompt_tokenizer")
description_tokenizer = AutoTokenizer.from_pretrained("2121-8/japanese-parler-tts-mini", subfolder="description_tokenizer")

prompt = "こんにちは、今日はどのようにお過ごしですか？"
description = "JSUT speaks with an expressive and animated tone in an excellent recording, with a very close-sounding proximity that suggests a private and intimate setting, and delivers her words at a rapid pace."


prompt = add_ruby(prompt)
input_ids = description_tokenizer(description, return_tensors="pt").input_ids.to(device)
prompt_input_ids = prompt_tokenizer(prompt, return_tensors="pt").input_ids.to(device)

generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
audio_arr = generation.cpu().numpy().squeeze()
sf.write("parler_tts_japanese_out.wav", audio_arr, model.config.sampling_rate)
```

### サンプル音声

<audio controls>
  <source src="https://huggingface.co/2121-8/japanese-parler-tts-mini/resolve/main/audio/jsut sample_1.wav" type="audio/wav">
  お使いのブラウザはオーディオタグをサポートしていません。
</audio>
</br>
<audio controls>
  <source src="https://huggingface.co/2121-8/japanese-parler-tts-mini/resolve/main/audio/jsut sample_2.wav" type="audio/wav">
  お使いのブラウザはオーディオタグをサポートしていません。
</audio>
</br>
<audio controls>
  <source src="https://huggingface.co/2121-8/japanese-parler-tts-mini/resolve/main/audio/jsut sample_3.wav" type="audio/wav">
  お使いのブラウザはオーディオタグをサポートしていません。
</audio>

---





### 著作権および使用に関する免責事項

本モデルおよびリポジトリは、研究、教育、商用利用を含む幅広い目的での利用が許可されています。ただし、以下の条件を遵守してください。

1. **商用利用に関する条件**  
   本モデルを使用して生成された音声や成果物を商用目的で利用することは可能ですが、本モデルそのもの（ファイルや重みデータなど）の販売は禁じられています。

2. **適切性についての免責**  
   本モデルの利用により得られる結果の正確性、合法性、または適切性について、作成者は一切保証しません。

3. **ユーザーの責任**  
   本モデルを使用する際は、適用されるすべての法律や規制を遵守してください。また、生成されたコンテンツに起因する責任はすべてユーザーに帰属します。

4. **作成者の免責**  
   本リポジトリおよびモデルの作成者は、著作権侵害やその他の法的問題に関する責任を一切負いません。

5. **削除要求への対応**  
   著作権問題が発生した場合、問題のあるリソースやデータを速やかに削除します。

---