--- |
datasets: |
- mozilla-foundation/common_voice_17_0 |
- wasmdashai/db-arabic-f1-nn |
language: |
- ar |
license: afl-3.0 |
pipeline_tag: text-to-speech |
--- |
# Model Card for Model ID |
## Model Details |
### Model Description |
<!-- Provide a longer summary of what this model is. --> |
An advanced text-to-speech (TTS) system specifically designed for the Arabic language, built on the VITS architecture and utilizing the pre-trained weights from Facebook's vits ara model. The model is capable of: |
Generating natural and realistic speech: Producing high-quality Arabic speech that closely mimics human voices, preserving intonation and linguistic nuances. |
Understanding colloquial text: Processing text written in various Arabic dialects, including idiomatic expressions and local vocabulary. |
Model Details |
VITS (Variational Inference with adversarial learning for end-to-end Text-to-Speech) is an end-to-end speech synthesis model that predicts a speech waveform conditional on an input text sequence. It is a conditional variational autoencoder (VAE) comprised of a posterior encoder, decoder, and conditional prior. |
A set of spectrogram-based acoustic features are predicted by the flow-based module, which is formed of a Transformer-based text encoder and multiple coupling layers. The spectrogram is decoded using a stack of transposed convolutional layers, much in the same style as the HiFi-GAN vocoder. Motivated by the one-to-many nature of the TTS problem, where the same text input can be spoken in multiple ways, the model also includes a stochastic duration predictor, which allows the model to synthesise speech with different rhythms from the same input text. |
## Usage |
MMS-TTS is available in the ๐ค Transformers library from version 4.33 onwards. To use this checkpoint, |
first install the latest version of the library: |
``` |
pip install transformers[torch] |
``` |
Then, run inference with the following code-snippet: |
```python |
from transformers import VitsModel, AutoTokenizer |
import torch |
model = VitsModel.from_pretrained("wasmdashai/vits-ar") |
tokenizer = AutoTokenizer.from_pretrained("wasmdashai/vits-ar") |
text = "ุงูุณูุงู
ุฉ ุงููู ูุจุฑูุงุชุฉ ู
ุง ุงูุฌุฏูุฏ ุ " |
inputs = tokenizer(text, return_tensors="pt") |
with torch.no_grad(): |
full_generation =model(**inputs) |
full_generation_waveform = full_generation.waveform.cpu().numpy().reshape(-1) |
from IPython.display import Audio |
Audio(full_generation_waveform, rate=model.config.sampling_rate) |
``` |
## Contact |
You can also email us at [email protected] |
## ู
ูุนุฉ ูู
ุงุฐุฌ ุชูููุฏ ุงูููุฌุงุช ุงูุนุฑุจูุฉ |
### ู
ุฉ |
ูุณุฑูุง ุฃู ูุนูู ุนู ุฅุตุฏุงุฑ ู
ูุนุฉ ู
ู ูู
ุงุฐุฌ ุชูููุฏ ุงูููุฌุงุช ุงูุนุฑุจูุฉ ูุฑูุจูุง. ุชู
ูุฐู ุงููู
ุงุฐุฌ ุจุงุณุชุฎุฏุงู
ุชูููุงุช ุงูุฐูุงุก ุงูุงุตุทูุงุนู ุงูู
ุฉ ูุชูุฏูู
ุชุฌุฑุจุฉ ุทุจูุนูุฉ ููุงูุนูุฉ ูู ุชุญููู ุงููุต ุฅูู ููุงู
(Text-to-Speech) ุจู
ุฎุชูู ุงูููุฌุงุช ุงูุนุฑุจูุฉ. |
### ุฌุฏูู ุงููู
ุงุฐุฌ |
| **ุงูููุฌุฉ** | **ุงุณู
ูุฐุฌ** | **ุงููุตู** | **ุชุงุฑูุฎ ุงูุฅุตุฏุงุฑ ุงูู
ุชููุน** | **ู
ุณุชูู ุฌูุฏุฉ ุงูุตูุช** | |
|-------------------|---------------------------------------------------------------------------------|---------------------------------------------------------------------------|----------------------------|----------------------| |
| ุงููุบุฉ ุงูุนุฑุจูุฉ | [vits-ar](https://huggingface.co/wasmdashai/vits-ar) | ูู
ูุฐุฌ ูุชุญููู ุงููุต ุฅูู ููุงู
ุจุงูููุฌุฉ ุงููู
ููุฉ ุจุชูุงุตูู ุฏูููุฉ. | ู
ุชููุฑ | ู
ุชูุณุท | |
| ุงูููุฌุฉ ุงููู
ููุฉ | [vits-ar-ye](https://huggingface.co/wasmdashai/vits-ar-ye) | ูู
ูุฐุฌ ูุชุญููู ุงููุต ุฅูู ููุงู
ุจุงูููุฌุฉ ุงููู
ููุฉ ุจุชูุงุตูู ุฏูููุฉ. | ูุฑูุจุงู | ู
ุชูุณุท | |
| ุงูููุฌุฉ ุงูุณุนูุฏูุฉ | [vits-ar-sa](https://huggingface.co/wasmdashai/vits-ar-sa-huba) | ูู
ูุฐุฌ ูุชุญููู ุงููุต ุฅูู ููุงู
ุจุงูููุฌุฉ ุงูุณุนูุฏูุฉ ุจุฌูุฏุฉ ุนุงููุฉ ูุชูุงุตูู ุฏูููุฉ. | ู
ุชููุฑ | ู
ุชูุณุท | |
| ุงูููุฌุฉ ุงูู
ุตุฑูุฉ | [vits-ar-eg](https://huggingface.co/wasmdashai/vits-ar-eg) | ูู
ูุฐุฌ ูุชุญููู ุงููุต ุฅูู ููุงู
ุจุงูููุฌุฉ ุงูู
ุตุฑูุฉ ุจุฃุณููุจ ุทุจูุนู ูุณูุณ. | ูุฑูุจุงู | ู
ุชูุณุท | |
| ุงูููุฌุฉ ุงููุจูุงููุฉ | [vits-ar-lb](https://huggingface.co/wasmdashai/vits-ar-lb) | ูู
ูุฐุฌ ู
ุชุฎุตุต ูู ุงูููุฌุฉ ุงููุจูุงููุฉ ูุชูููุฏ ููุงู
ุจุชูุงุตูู ุฏูููุฉ ููุงูุนูุฉ. | ูุฑูุจุงู | ู
ุชูุณุท | |
| ุงูููุฌุฉ ุงูู
ุบุฑุจูุฉ | [vits-ar-ma](https://huggingface.co/wasmdashai/vits-ar-ma) | ูู
ูุฐุฌ ูุชุญููู ุงููุต ุฅูู ููุงู
ุจุงูููุฌุฉ ุงูู
ุบุฑุจูุฉ ุจูุฏุฑุฉ ุนูู ููู
ุตุทูุญุงุช ุงูู
ุญููุฉ.| ูุฑูุจุงู | ู
ุชูุณุท | |
| ุงูููุฌุฉ ุงูุฅู
ุงุฑุงุชูุฉ | [vits-ar-ae](https://huggingface.co/wasmdashai/vits-ar-ae) | ูู
ูุฐุฌ ูุชุญููู ุงููุต ุฅูู ููุงู
ุจุงูููุฌุฉ ุงูุฅู
ุงุฑุงุชูุฉ ุจูุงูุนูุฉ ูุชูุงุตูู ุฏูููุฉ. | ูุฑูุจุงู | ู
ุชูุณุท | |
| ุงูููุฌุฉ ุงูุฃุฑุฏููุฉ | [vits-ar-jo](https://huggingface.co/wasmdashai/vits-ar-jo) | ูู
ูุฐุฌ ูุชุญููู ุงููุต ุฅูู ููุงู
ุจุงูููุฌุฉ ุงูุฃุฑุฏููุฉ ุจุฅุชูุงู ููุชูุงุตูู ุงูุตูุชูุฉ. | ูุฑูุจุงู | ู
ุชูุณุท | |
| ุงูููุฌุฉ ุงูุนุฑุงููุฉ | [vits-ar-iq](https://huggingface.co/wasmdashai/vits-ar-iq) | ูู
ูุฐุฌ ูุชูููุฏ ุงูููุงู
ุจุงูููุฌุฉ ุงูุนุฑุงููุฉ ุจุฏูุฉ ูู ูุทู ุงูููู
ุงุช ูุงูุชุนุงุจูุฑ ุงูุดุงุฆุนุฉ. | ูุฑูุจุงู | ู
ุชูุณุท | |
| ุงูููุฌุฉ ุงูุณูุฑูุฉ | [vits-ar-sy](https://huggingface.co/wasmdashai/vits-ar-sy) | ูู
ูุฐุฌ ูุชุญููู ุงููุต ุฅูู ููุงู
ุจุงูููุฌุฉ ุงูุณูุฑูุฉ ุจูุถูุญ ูุตูุช ุทุจูุนู. | ูุฑูุจุงู | ู
ุชูุณุท | |
| ุงูููุฌุฉ ุงูููุณุทูููุฉ | [vits-ar-ps](https://huggingface.co/wasmdashai/vits-ar-ps) | ูู
ูุฐุฌ ูุชุญููู ุงููุต ุฅูู ููุงู
ุจุงูููุฌุฉ ุงูููุณุทูููุฉ ุจุชูุงุตูู ุฏูููุฉ. | ูุฑูุจุงู | ู
ุชูุณุท | |
| ุงูููุฌุฉ ุงูุณูุฏุงููุฉ | [vits-ar-sd](https://huggingface.co/wasmdashai/vits-ar-sd) | ูู
ูุฐุฌ ูุชุญููู ุงููุต ุฅูู ููุงู
ุจุงูููุฌุฉ ุงูุณูุฏุงููุฉ ู
ุน ููู
ูุฑุฏุงุช ุงูู
ุญููุฉ. | ูุฑูุจุงู | ู
ุชูุณุท | |
| ุงูููุฌุฉ ุงูุฌุฒุงุฆุฑูุฉ | [vits-ar-dz](https://huggingface.co/wasmdashai/vits-ar-dz) | ูู
ูุฐุฌ ูุชุญููู ุงููุต ุฅูู ููุงู
ุจุงูููุฌุฉ ุงูุฌุฒุงุฆุฑูุฉ ุจุฏูุฉ ูุฌูุฏุฉ ุนุงููุฉ. | ูุฑูุจุงู | ู
ุชูุณุท | |
| ุงูููุฌุฉ ุงูุชููุณูุฉ | [vits-ar-tn](https://huggingface.co/wasmdashai/vits-ar-tn) | ูู
ูุฐุฌ ูุชุญููู ุงููุต ุฅูู ููุงู
ุจุงูููุฌุฉ ุงูุชููุณูุฉ ุจุฅุชูุงู ููุชูุงุตูู ุงูู
ุญููุฉ. | ูุฑูุจุงู | ู
ุชูุณุท | |
| ุงูููุฌุฉ ุงูููุจูุฉ | [vits-ar-ly](https://huggingface.co/wasmdashai/vits-ar-ly) | ูู
ูุฐุฌ ูุชุญููู ุงููุต ุฅูู ููุงู
ุจุงูููุฌุฉ ุงูููุจูุฉ ุจุฏูุฉ ููุงูุนูุฉ ูู ุงููุทู. | ูุฑูุจุงู | ู
ุชูุณุท | |
| ุงูููุฌุฉ ุงูุจุญุฑูููุฉ | [vits-ar-bh](https://huggingface.co/wasmdashai/vits-ar-bh) | ูู
ูุฐุฌ ูุชุญููู ุงููุต ุฅูู ููุงู
ุจุงูููุฌุฉ ุงูุจุญุฑูููุฉ ุจุฌูุฏุฉ ุตูุช ุนุงููุฉ. | ูุฑูุจุงู | ู
ุชูุณุท | |
| ุงูููุฌุฉ ุงูุนู
ุงููุฉ | [vits-ar-om](https://huggingface.co/wasmdashai/vits-ar-om) | ูู
ูุฐุฌ ูุชุญููู ุงููุต ุฅูู ููุงู
ุจุงูููุฌุฉ ุงูุนู
ุงููุฉ ุจุฏูุฉ ููุถูุญ ูู ุงููุทู. | ูุฑูุจุงู | ู
ุชูุณุท | |
| ุงูููุฌุฉ ุงููุทุฑูุฉ | [vits-ar-qa](https://huggingface.co/wasmdashai/vits-ar-qa) | ูู
ูุฐุฌ ูุชุญููู ุงููุต ุฅูู ููุงู
ุจุงูููุฌุฉ ุงููุทุฑูุฉ ุจุชูุงุตูู ุฏูููุฉ ููุงูุนูุฉ. | ูุฑูุจุงู | ู
ุชูุณุท | |
| ุงูููุฌุฉ ุงููููุชูุฉ | [vits-ar-kw](https://huggingface.co/wasmdashai/vits-ar-kw) | ูู
ูุฐุฌ ูุชุญููู ุงููุต ุฅูู ููุงู
ุจุงูููุฌุฉ ุงููููุชูุฉ ุจุฌูุฏุฉ ุนุงููุฉ ููุถูุญ. | ูุฑูุจุงู | ู
ุชูุณุท | |
| ุงูููุฌุฉ ุงูู
ูุฑูุชุงููุฉ | [vits-ar-mr](https://huggingface.co/wasmdashai/vits-ar-mr) | ูู
ูุฐุฌ ูุชุญููู ุงููุต ุฅูู ููุงู
ุจุงูููุฌุฉ ุงูู
ูุฑูุชุงููุฉ ุจุชูุงุตูู ุฏูููุฉ ููุงูุนูุฉ. | ูุฑูุจุงู | ู
ุชูุณุท | |
### ุงูุชูุงุตูู ุงููููุฉ |
ุฏ ุฌู
ูุน ุงููู
ุงุฐุฌ ุนูู ุจููุฉ VITSุ ููู ูู
ูุฐุฌ ุดุงู
ู ูุชุญููู ุงููุต ุฅูู ููุงู
ูุชูุญ ุชูููุฏ ู
ูุฌุงุช ุตูุชูุฉ ูุงูุนูุฉ ุจูุงุกู ุนูู ุงูู
ุฏุฎูุงุช ุงููุตูุฉ. ุชุญุชูู ุงููู
ุงุฐุฌ ุนูู ู
ุญููุงุช ูุชุญููู ุงููุต ูุชูููุฏ ุงูููุงู
ุจูุงุกู ุนูู ุฎุตุงุฆุต ุงูุตูุช ุงูู
ุญููุฉ ููู ููุฌุฉ. |
### ุงูุชุฑููุงุช ุงูู
ุณุชูุจููุฉ |
ุชุญุฏูุซุงุช ู
ุฉ ูุชุญุณูู ุฌูุฏุฉ ุงูุตูุช ูุฒูุงุฏุฉ ููุงุกุฉ ููู
ุงูููุฌุงุช ุงูู
ุฎุชููุฉ. ุชุงุจุนููุง ูู
ุนุฑูุฉ ุงูู
ุฒูุฏ ุญูู ุชูุงุฑูุฎ ุงูุฅุทูุงู ุงูุฏูููุฉ ููู ูู
ูุฐุฌ. |
## Acknowledgements |
This implementation is based on [tts-arabic](https://github.com/nipponjo/tts-arabic-pytorch), [VITS](https://github.com/jaywalnut310/vits), [Finetune VITS](https://github.com/ylacombe/finetune-hf-vits) and [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2). We appreciate their awesome work. |