w11wo commited on
Commit
e089318
·
1 Parent(s): 45b8e55

Updated README

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -9,9 +9,9 @@ tags:
9
  inference: false
10
  datasets:
11
  - bookbot/sw-TZ-Victoria
12
- - bookbot/sw-TZ-Victoria-syllables
13
  - bookbot/sw-TZ-Victoria-v2
14
- - bookbot/sw-TZ-VictoriaNeural
15
  ---
16
 
17
  # LightSpeech MFA SW v4
@@ -19,9 +19,9 @@ datasets:
19
  LightSpeech MFA SW v4 is a text-to-mel-spectrogram model based on the [LightSpeech](https://arxiv.org/abs/2102.04040) architecture. This model was fine-tuned from [LightSpeech MFA SW v1](https://huggingface.co/bookbot/lightspeech-mfa-sw-v1) and trained on real and synthetic audio datasets. The list of speakers include:
20
 
21
  - sw-TZ-Victoria
22
- - sw-TZ-Victoria-syllables
23
  - sw-TZ-Victoria-v2
24
- - sw-TZ-VictoriaNeural
25
 
26
  We trained an acoustic Swahili model on our speech corpus using [Montreal Forced Aligner v3.0.0](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) and used it as the duration extractor. That model, and consequently our model, uses the IPA phone set for Swahili. We used [gruut](https://github.com/rhasspy/gruut) for phonemization purposes. We followed these [steps](https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/mfa_extraction) to perform duration extraction.
27
 
 
9
  inference: false
10
  datasets:
11
  - bookbot/sw-TZ-Victoria
12
+ - bookbot/sw-TZ-Victoria-syllables-word
13
  - bookbot/sw-TZ-Victoria-v2
14
+ - bookbot/sw-TZ-VictoriaNeural-upsampled-48kHz
15
  ---
16
 
17
  # LightSpeech MFA SW v4
 
19
  LightSpeech MFA SW v4 is a text-to-mel-spectrogram model based on the [LightSpeech](https://arxiv.org/abs/2102.04040) architecture. This model was fine-tuned from [LightSpeech MFA SW v1](https://huggingface.co/bookbot/lightspeech-mfa-sw-v1) and trained on real and synthetic audio datasets. The list of speakers include:
20
 
21
  - sw-TZ-Victoria
22
+ - sw-TZ-Victoria-syllables-word
23
  - sw-TZ-Victoria-v2
24
+ - sw-TZ-VictoriaNeural-upsampled-48kHz
25
 
26
  We trained an acoustic Swahili model on our speech corpus using [Montreal Forced Aligner v3.0.0](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) and used it as the duration extractor. That model, and consequently our model, uses the IPA phone set for Swahili. We used [gruut](https://github.com/rhasspy/gruut) for phonemization purposes. We followed these [steps](https://github.com/TensorSpeech/TensorFlowTTS/tree/master/examples/mfa_extraction) to perform duration extraction.
27