XTTS v2 Fine-Tuned on Hindi Datasets

Model Name: XTTS v2 Fine-Tuned on Hindi Datasets

Model Description: This is a fine-tuned version of the XTTS v2 (Cross-lingual Text-to-Speech) model developed by Coqui-AI, specifically fine-tuned on Hindi speech datasets to improve performance in generating natural and accurate Hindi speech. The model supports a range of features including voice cloning and multilingual speech generation.

Colab Notebook

You can view the Colab notebook used for fine-tuning the XTTS v2 model on Hindi datasets and replicate the process by following this Colab Notebook Link.

Features

  • Languages: Supports 16 languages including Hindi (hi).
  • Voice Cloning: Clone voices with just a 6-second audio clip.
  • Emotion and Style Transfer: Achieve emotion and style transfer by cloning.
  • Cross-Language Voice Cloning: Supports voice cloning across different languages.
  • Sampling Rate: 24kHz sampling rate for high-quality audio.

Updates over XTTS-v1

  • New Languages: Added support for Hungarian and Korean.
  • Architectural Improvements: Enhanced speaker conditioning and interpolation.
  • Stability Improvements: Better overall stability and performance.
  • Audio Quality: Improved prosody and audio quality.

Languages

The XTTS-v2 model supports 17 languages including:

  • English (en)
  • Spanish (es)
  • French (fr)
  • German (de)
  • Italian (it)
  • Portuguese (pt)
  • Polish (pl)
  • Turkish (tr)
  • Russian (ru)
  • Dutch (nl)
  • Czech (cs)
  • Arabic (ar)
  • Chinese (zh-cn)
  • Japanese (ja)
  • Hungarian (hu)
  • Korean (ko)
  • Hindi (hi)

Training Data

The model was fine-tuned on the following Hindi datasets:

  • Mozilla CommonVoice 18: A diverse dataset of Hindi speech.
  • IndicTTS Hindi Dataset: Hindi speech data for text-to-speech training.

Code

The code-base supports both inference and fine-tuning.

Demo Spaces

  • XTTS Space: Explore the model's performance on supported languages and try it with your own reference or microphone input.
  • XTTS Voice Chat with Mistral or Zephyr: Experience streaming voice chat with Mistral 7B Instruct or Zephyr 7B Beta.

License

This model is licensed under the Coqui Public Model License. Read more about the origin story of CPML here.

Contact

Join our ๐Ÿธ Community on Discord and follow us on Twitter. For inquiries, you can also email us at [email protected].

Usage

Using ๐ŸธTTS API

from TTS.api import TTS

# Load the model
tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2", gpu=True)

# Generate speech by cloning a voice using default settings
tts.tts_to_file(
    text="It took me quite a long time to develop a voice, and now that I have it I'm not going to be silent.",
    file_path="output.wav",
    speaker_wav="/path/to/target/speaker.wav",
    language="hi"
)
Downloads last month
17
Inference Examples
Inference API (serverless) does not yet support coqui models for this pipeline type.

Spaces using Abhinay45/XTTS-Hindi-finetuned 2