budecosystem
/

Tansen

Model card Files Files and versions Community

Tansen / README.md

rahulvramesh's picture

Update README.md

596819f over 1 year ago

|

history blame contribute delete

3.38 kB

	---
	license: openrail++
	---






	<p align="center">
	<img src="https://raw.githubusercontent.com/BudEcosystem/Tansen/main/Instagram%20post%20-%204.png" alt="Tensen Logo" width="300" height="300"/>
	</p>

	---

	<p align="center"><i>Democratizing access to LLMs, Multi-Modal Gen AI models for the open-source community.<br>Let's advance AI, together. </i></p>

	---


	Tansen is a text-to-speech program built with the following priorities:

	1. Strong multi-voice capabilities.
	2. Highly realistic prosody and intonation.
	3. Speaking rate control


	<a href="https://github.com/BudEcosystem/Tansen"><img src="https://img.shields.io/badge/GitHub-100000?style=for-the-badge&logo=github&logoColor=white" /> </a>


	<h2 align="left">🎧 Demos </h2>



	### Demos

	[random_0_0.webm](https://github.com/BudEcosystem/Tansen/assets/4546714/9a6ce191-2646-497e-bf48-003f2bf0bb8d)

	[random_0_1.webm](https://github.com/BudEcosystem/Tansen/assets/4546714/87bf5f7c-ae47-4aa4-a110-b5c9899e4446)

	[random_0_2.webm](https://github.com/BudEcosystem/Tansen/assets/4546714/5549c464-c670-4e7a-987c-c5d79b32bf4b)

	<h2 align="left">💻 Getting Started on GitHub </h2>

	Ready to dive in? Here's how you can get started with our repo on GitHub.

	<h3 align="left">1️⃣ : Clone our GitHub repository</h3>

	First things first, you'll need to clone our repository. Open up your terminal, navigate to the directory where you want the repository to be cloned, and run the following command:

	```bash
	conda create --name Tansen python=3.9 numba inflect
	conda activate Tansen
	conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
	conda install transformers=4.29.2
	git clone https://github.com/BudEcosystem/Tansen.git
	cd Tansen
	```

	<h3 align="left">2️⃣ : Install dependencies</h3>

	```bash
	python setup.py install
	```

	<h3 align="left">3️⃣ : Generate Audio</h3>

	### do_tts.py

	This script allows you to speak a single phrase with one or more voices.

	```shell
	python do_tts.py --text "I'm going to speak this" --voice random --preset fast
	```

	### read.py

	This script provides tools for reading large amounts of text.

	```shell
	python Tansen/read.py --textfile <your text to be read> --voice random
	```

	This will break up the textfile into sentences, and then convert them to speech one at a time. It will output a series
	of spoken clips as they are generated. Once all the clips are generated, it will combine them into a single file and
	output that as well.

	Sometimes Tansen screws up an output. You can re-generate any bad clips by re-running `read.py` with the --regenerate
	argument.

	Intrested in running as as API ?

	### 🐍 Usage in Python

	Tansen can be used programmatically :

	```python
	reference_clips = [utils.audio.load_audio(p, 22050) for p in clips_paths]
	tts = api.TextToSpeech(use_deepspeed=True, kv_cache=True, half=True)
	pcm_audio = tts.tts_with_preset("your text here", voice_samples=reference_clips, preset='fast')
	```

	## Loss Curves

	<p align="center">
	<img src="https://raw.githubusercontent.com/BudEcosystem/Tansen/main/results/images/loss_mel_ce.png" alt="" width="500"/>
	<span>loss_mel_ce</span>
	<p>

	<p align="center">
	<img src="https://raw.githubusercontent.com/BudEcosystem/Tansen/main/results/images/loss_text_ce.png" alt="" width="500" />
	<span>loss_text_ce</span>
	<p>


	## Training Information

	Device : A Single A100

	Dataset : 876 hours