|
--- |
|
datasets: |
|
- mozilla-foundation/common_voice_17_0 |
|
- wasmdashai/db-arabic-f1-nn |
|
language: |
|
- ar |
|
license: afl-3.0 |
|
pipeline_tag: text-to-speech |
|
--- |
|
# Model Card for Model ID |
|
|
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
An advanced text-to-speech (TTS) system specifically designed for the Arabic language, built on the VITS architecture and utilizing the pre-trained weights from Facebook's vits ara model. The model is capable of: |
|
|
|
Generating natural and realistic speech: Producing high-quality Arabic speech that closely mimics human voices, preserving intonation and linguistic nuances. |
|
Understanding colloquial text: Processing text written in various Arabic dialects, including idiomatic expressions and local vocabulary. |
|
|
|
Model Details |
|
VITS (Variational Inference with adversarial learning for end-to-end Text-to-Speech) is an end-to-end speech synthesis model that predicts a speech waveform conditional on an input text sequence. It is a conditional variational autoencoder (VAE) comprised of a posterior encoder, decoder, and conditional prior. |
|
|
|
A set of spectrogram-based acoustic features are predicted by the flow-based module, which is formed of a Transformer-based text encoder and multiple coupling layers. The spectrogram is decoded using a stack of transposed convolutional layers, much in the same style as the HiFi-GAN vocoder. Motivated by the one-to-many nature of the TTS problem, where the same text input can be spoken in multiple ways, the model also includes a stochastic duration predictor, which allows the model to synthesise speech with different rhythms from the same input text. |
|
|
|
## Usage |
|
|
|
MMS-TTS is available in the ๐ค Transformers library from version 4.33 onwards. To use this checkpoint, |
|
first install the latest version of the library: |
|
|
|
``` |
|
pip install transformers[torch] |
|
``` |
|
|
|
Then, run inference with the following code-snippet: |
|
|
|
```python |
|
from transformers import VitsModel, AutoTokenizer |
|
import torch |
|
|
|
model = VitsModel.from_pretrained("wasmdashai/vits-ar") |
|
tokenizer = AutoTokenizer.from_pretrained("wasmdashai/vits-ar") |
|
|
|
text = "ุงูุณูุงู
ุนูููู
ูุฑุญู
ุฉ ุงููู ูุจุฑูุงุชุฉ ู
ุง ุงูุฌุฏูุฏ ุ " |
|
inputs = tokenizer(text, return_tensors="pt") |
|
|
|
with torch.no_grad(): |
|
full_generation =model(**inputs) |
|
full_generation_waveform = full_generation.waveform.cpu().numpy().reshape(-1) |
|
|
|
from IPython.display import Audio |
|
|
|
Audio(full_generation_waveform, rate=model.config.sampling_rate) |
|
|
|
``` |
|
|
|
## Contact |
|
You can also email us at [email protected] |
|
|
|
|
|
|
|
## ู
ุฌู
ูุนุฉ ูู
ุงุฐุฌ ุชูููุฏ ุงูููุฌุงุช ุงูุนุฑุจูุฉ |
|
|
|
### ู
ูุฏู
ุฉ |
|
|
|
ูุณุฑูุง ุฃู ูุนูู ุนู ุฅุตุฏุงุฑ ู
ุฌู
ูุนุฉ ู
ู ูู
ุงุฐุฌ ุชูููุฏ ุงูููุฌุงุช ุงูุนุฑุจูุฉ ูุฑูุจูุง. ุชู
ุชุตู
ูู
ูุฐู ุงููู
ุงุฐุฌ ุจุงุณุชุฎุฏุงู
ุชูููุงุช ุงูุฐูุงุก ุงูุงุตุทูุงุนู ุงูู
ุชูุฏู
ุฉ ูุชูุฏูู
ุชุฌุฑุจุฉ ุทุจูุนูุฉ ููุงูุนูุฉ ูู ุชุญููู ุงููุต ุฅูู ููุงู
(Text-to-Speech) ุจู
ุฎุชูู ุงูููุฌุงุช ุงูุนุฑุจูุฉ. |
|
|
|
### ุฌุฏูู ุงููู
ุงุฐุฌ |
|
| **ุงูููุฌุฉ** | **ุงุณู
ุงููู
ูุฐุฌ** | **ุงููุตู** | **ุชุงุฑูุฎ ุงูุฅุตุฏุงุฑ ุงูู
ุชููุน** | **ู
ุณุชูู ุฌูุฏุฉ ุงูุตูุช** | |
|
|-------------------|---------------------------------------------------------------------------------|---------------------------------------------------------------------------|----------------------------|----------------------| |
|
| ุงููุบุฉ ุงูุนุฑุจูุฉ | [vits-ar](https://huggingface.co/wasmdashai/vits-ar) | ูู
ูุฐุฌ ูุชุญููู ุงููุต ุฅูู ููุงู
ุจุงูููุฌุฉ ุงููู
ููุฉ ุจุชูุงุตูู ุฏูููุฉ. | ู
ุชููุฑ | ู
ุชูุณุท | |
|
| ุงูููุฌุฉ ุงููู
ููุฉ | [vits-ar-ye](https://huggingface.co/wasmdashai/vits-ar-ye) | ูู
ูุฐุฌ ูุชุญููู ุงููุต ุฅูู ููุงู
ุจุงูููุฌุฉ ุงููู
ููุฉ ุจุชูุงุตูู ุฏูููุฉ. | ูุฑูุจุงู | ู
ุชูุณุท | |
|
| ุงูููุฌุฉ ุงูุณุนูุฏูุฉ | [vits-ar-sa](https://huggingface.co/wasmdashai/vits-ar-sa-huba) | ูู
ูุฐุฌ ูุชุญููู ุงููุต ุฅูู ููุงู
ุจุงูููุฌุฉ ุงูุณุนูุฏูุฉ ุจุฌูุฏุฉ ุนุงููุฉ ูุชูุงุตูู ุฏูููุฉ. | ู
ุชููุฑ | ู
ุชูุณุท | |
|
| ุงูููุฌุฉ ุงูู
ุตุฑูุฉ | [vits-ar-eg](https://huggingface.co/wasmdashai/vits-ar-eg) | ูู
ูุฐุฌ ูุชุญููู ุงููุต ุฅูู ููุงู
ุจุงูููุฌุฉ ุงูู
ุตุฑูุฉ ุจุฃุณููุจ ุทุจูุนู ูุณูุณ. | ูุฑูุจุงู | ู
ุชูุณุท | |
|
| ุงูููุฌุฉ ุงููุจูุงููุฉ | [vits-ar-lb](https://huggingface.co/wasmdashai/vits-ar-lb) | ูู
ูุฐุฌ ู
ุชุฎุตุต ูู ุงูููุฌุฉ ุงููุจูุงููุฉ ูุชูููุฏ ููุงู
ุจุชูุงุตูู ุฏูููุฉ ููุงูุนูุฉ. | ูุฑูุจุงู | ู
ุชูุณุท | |
|
| ุงูููุฌุฉ ุงูู
ุบุฑุจูุฉ | [vits-ar-ma](https://huggingface.co/wasmdashai/vits-ar-ma) | ูู
ูุฐุฌ ูุชุญููู ุงููุต ุฅูู ููุงู
ุจุงูููุฌุฉ ุงูู
ุบุฑุจูุฉ ุจูุฏุฑุฉ ุนูู ููู
ุงูู
ุตุทูุญุงุช ุงูู
ุญููุฉ.| ูุฑูุจุงู | ู
ุชูุณุท | |
|
| ุงูููุฌุฉ ุงูุฅู
ุงุฑุงุชูุฉ | [vits-ar-ae](https://huggingface.co/wasmdashai/vits-ar-ae) | ูู
ูุฐุฌ ูุชุญููู ุงููุต ุฅูู ููุงู
ุจุงูููุฌุฉ ุงูุฅู
ุงุฑุงุชูุฉ ุจูุงูุนูุฉ ูุชูุงุตูู ุฏูููุฉ. | ูุฑูุจุงู | ู
ุชูุณุท | |
|
| ุงูููุฌุฉ ุงูุฃุฑุฏููุฉ | [vits-ar-jo](https://huggingface.co/wasmdashai/vits-ar-jo) | ูู
ูุฐุฌ ูุชุญููู ุงููุต ุฅูู ููุงู
ุจุงูููุฌุฉ ุงูุฃุฑุฏููุฉ ุจุฅุชูุงู ููุชูุงุตูู ุงูุตูุชูุฉ. | ูุฑูุจุงู | ู
ุชูุณุท | |
|
| ุงูููุฌุฉ ุงูุนุฑุงููุฉ | [vits-ar-iq](https://huggingface.co/wasmdashai/vits-ar-iq) | ูู
ูุฐุฌ ูุชูููุฏ ุงูููุงู
ุจุงูููุฌุฉ ุงูุนุฑุงููุฉ ุจุฏูุฉ ูู ูุทู ุงูููู
ุงุช ูุงูุชุนุงุจูุฑ ุงูุดุงุฆุนุฉ. | ูุฑูุจุงู | ู
ุชูุณุท | |
|
| ุงูููุฌุฉ ุงูุณูุฑูุฉ | [vits-ar-sy](https://huggingface.co/wasmdashai/vits-ar-sy) | ูู
ูุฐุฌ ูุชุญููู ุงููุต ุฅูู ููุงู
ุจุงูููุฌุฉ ุงูุณูุฑูุฉ ุจูุถูุญ ูุตูุช ุทุจูุนู. | ูุฑูุจุงู | ู
ุชูุณุท | |
|
| ุงูููุฌุฉ ุงูููุณุทูููุฉ | [vits-ar-ps](https://huggingface.co/wasmdashai/vits-ar-ps) | ูู
ูุฐุฌ ูุชุญููู ุงููุต ุฅูู ููุงู
ุจุงูููุฌุฉ ุงูููุณุทูููุฉ ุจุชูุงุตูู ุฏูููุฉ. | ูุฑูุจุงู | ู
ุชูุณุท | |
|
| ุงูููุฌุฉ ุงูุณูุฏุงููุฉ | [vits-ar-sd](https://huggingface.co/wasmdashai/vits-ar-sd) | ูู
ูุฐุฌ ูุชุญููู ุงููุต ุฅูู ููุงู
ุจุงูููุฌุฉ ุงูุณูุฏุงููุฉ ู
ุน ููู
ุงูู
ูุฑุฏุงุช ุงูู
ุญููุฉ. | ูุฑูุจุงู | ู
ุชูุณุท | |
|
| ุงูููุฌุฉ ุงูุฌุฒุงุฆุฑูุฉ | [vits-ar-dz](https://huggingface.co/wasmdashai/vits-ar-dz) | ูู
ูุฐุฌ ูุชุญููู ุงููุต ุฅูู ููุงู
ุจุงูููุฌุฉ ุงูุฌุฒุงุฆุฑูุฉ ุจุฏูุฉ ูุฌูุฏุฉ ุนุงููุฉ. | ูุฑูุจุงู | ู
ุชูุณุท | |
|
| ุงูููุฌุฉ ุงูุชููุณูุฉ | [vits-ar-tn](https://huggingface.co/wasmdashai/vits-ar-tn) | ูู
ูุฐุฌ ูุชุญููู ุงููุต ุฅูู ููุงู
ุจุงูููุฌุฉ ุงูุชููุณูุฉ ุจุฅุชูุงู ููุชูุงุตูู ุงูู
ุญููุฉ. | ูุฑูุจุงู | ู
ุชูุณุท | |
|
| ุงูููุฌุฉ ุงูููุจูุฉ | [vits-ar-ly](https://huggingface.co/wasmdashai/vits-ar-ly) | ูู
ูุฐุฌ ูุชุญููู ุงููุต ุฅูู ููุงู
ุจุงูููุฌุฉ ุงูููุจูุฉ ุจุฏูุฉ ููุงูุนูุฉ ูู ุงููุทู. | ูุฑูุจุงู | ู
ุชูุณุท | |
|
| ุงูููุฌุฉ ุงูุจุญุฑูููุฉ | [vits-ar-bh](https://huggingface.co/wasmdashai/vits-ar-bh) | ูู
ูุฐุฌ ูุชุญููู ุงููุต ุฅูู ููุงู
ุจุงูููุฌุฉ ุงูุจุญุฑูููุฉ ุจุฌูุฏุฉ ุตูุช ุนุงููุฉ. | ูุฑูุจุงู | ู
ุชูุณุท | |
|
| ุงูููุฌุฉ ุงูุนู
ุงููุฉ | [vits-ar-om](https://huggingface.co/wasmdashai/vits-ar-om) | ูู
ูุฐุฌ ูุชุญููู ุงููุต ุฅูู ููุงู
ุจุงูููุฌุฉ ุงูุนู
ุงููุฉ ุจุฏูุฉ ููุถูุญ ูู ุงููุทู. | ูุฑูุจุงู | ู
ุชูุณุท | |
|
| ุงูููุฌุฉ ุงููุทุฑูุฉ | [vits-ar-qa](https://huggingface.co/wasmdashai/vits-ar-qa) | ูู
ูุฐุฌ ูุชุญููู ุงููุต ุฅูู ููุงู
ุจุงูููุฌุฉ ุงููุทุฑูุฉ ุจุชูุงุตูู ุฏูููุฉ ููุงูุนูุฉ. | ูุฑูุจุงู | ู
ุชูุณุท | |
|
| ุงูููุฌุฉ ุงููููุชูุฉ | [vits-ar-kw](https://huggingface.co/wasmdashai/vits-ar-kw) | ูู
ูุฐุฌ ูุชุญููู ุงููุต ุฅูู ููุงู
ุจุงูููุฌุฉ ุงููููุชูุฉ ุจุฌูุฏุฉ ุนุงููุฉ ููุถูุญ. | ูุฑูุจุงู | ู
ุชูุณุท | |
|
| ุงูููุฌุฉ ุงูู
ูุฑูุชุงููุฉ | [vits-ar-mr](https://huggingface.co/wasmdashai/vits-ar-mr) | ูู
ูุฐุฌ ูุชุญููู ุงููุต ุฅูู ููุงู
ุจุงูููุฌุฉ ุงูู
ูุฑูุชุงููุฉ ุจุชูุงุตูู ุฏูููุฉ ููุงูุนูุฉ. | ูุฑูุจุงู | ู
ุชูุณุท | |
|
|
|
### ุงูุชูุงุตูู ุงููููุฉ |
|
|
|
ุชุนุชู
ุฏ ุฌู
ูุน ุงููู
ุงุฐุฌ ุนูู ุจููุฉ VITSุ ููู ูู
ูุฐุฌ ุดุงู
ู ูุชุญููู ุงููุต ุฅูู ููุงู
ูุชูุญ ุชูููุฏ ู
ูุฌุงุช ุตูุชูุฉ ูุงูุนูุฉ ุจูุงุกู ุนูู ุงูู
ุฏุฎูุงุช ุงููุตูุฉ. ุชุญุชูู ุงููู
ุงุฐุฌ ุนูู ู
ุญููุงุช ูุชุญููู ุงููุต ูุชูููุฏ ุงูููุงู
ุจูุงุกู ุนูู ุฎุตุงุฆุต ุงูุตูุช ุงูู
ุญููุฉ ููู ููุฌุฉ. |
|
|
|
### ุงูุชุฑููุงุช ุงูู
ุณุชูุจููุฉ |
|
|
|
ุณูุชู
ุชูุฏูู
ุชุญุฏูุซุงุช ู
ูุชุธู
ุฉ ูุชุญุณูู ุฌูุฏุฉ ุงูุตูุช ูุฒูุงุฏุฉ ููุงุกุฉ ููู
ุงูููุฌุงุช ุงูู
ุฎุชููุฉ. ุชุงุจุนููุง ูู
ุนุฑูุฉ ุงูู
ุฒูุฏ ุญูู ุชูุงุฑูุฎ ุงูุฅุทูุงู ุงูุฏูููุฉ ููู ูู
ูุฐุฌ. |
|
|
|
|
|
|
|
## Acknowledgements |
|
|
|
|
|
|
|
This implementation is based on [tts-arabic](https://github.com/nipponjo/tts-arabic-pytorch), [VITS](https://github.com/jaywalnut310/vits), [Finetune VITS](https://github.com/ylacombe/finetune-hf-vits) and [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2). We appreciate their awesome work. |
|
|