SpeechT5 Text To Speech

DON'T USE THIS. This is a model that failed to train. However, someone saw my code and asked me to share this model, so I put it up.

import torch
from transformers import AutoTokenizer, SpeechT5HifiGan, SpeechT5ForTextToSpeech

vocoder = SpeechT5HifiGan.from_pretrained("microsoft/speecht5_hifigan")
tokenizer = AutoTokenizer.from_pretrained("Bingsu/speecht5_test")
model = SpeechT5ForTextToSpeech.from_pretrained("Bingsu/speecht5_test")

emb_url = "https://huggingface.co/Bingsu/speecht5_test/resolve/main/speaker_embedding.pt"
emb_sd = torch.hub.load_state_dict_from_url(emb_url, map_location="cpu")
emb = torch.nn.Embedding(model.config.num_speakers, model.config.speaker_embedding_dim)
emb.load_state_dict(emb_sd)
@torch.inference_mode()
def gen(text: str, speaker_id: int = 0):
    inputs = tokenizer(text, return_tensors="pt")
    s_id = torch.tensor(speaker_id)

    speaker_embeddings = emb(s_id).unsqueeze(0)
    speech = model.generate_speech(inputs.input_ids, speaker_embeddings=speaker_embeddings, vocoder=vocoder)
    return speech.numpy()
Downloads last month
10
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.