ylacombe commited on
Commit
a939f66
1 Parent(s): d011047

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +76 -0
README.md ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ pipeline_tag: text-to-speech
4
+ tags:
5
+ - transformers.js
6
+ - mms
7
+ - vits
8
+ license: cc-by-nc-4.0
9
+ datasets:
10
+ - ylacombe/google-argentinian-spanish
11
+ language:
12
+ - es
13
+ ---
14
+
15
+ ## Model
16
+
17
+ This is a finetuned version of the [Spanish version](https://huggingface.co/facebook/mms-tts-spa) of Massively Multilingual Speech (MMS) models, which are light-weight, low-latency TTS models based on the [VITS architecture](https://huggingface.co/docs/transformers/model_doc/vits).
18
+
19
+ It was trained in around **20 minutes** with as little as **80 to 150 samples**, on this [Argentinian Spanish dataset](https://huggingface.co/datasets/ylacombe/google-argentinian-spanish).
20
+
21
+ Training recipe available in this [github repository: **ylacombe/finetune-hf-vits**](https://github.com/ylacombe/finetune-hf-vits).
22
+
23
+
24
+ ## Usage
25
+
26
+ ### Transformers
27
+
28
+ ```python
29
+ from transformers import pipeline
30
+ import scipy
31
+
32
+ model_id = "ylacombe/mms-spa-finetuned-argentinian-monospeaker"
33
+ synthesiser = pipeline("text-to-speech", model_id) # add device=0 if you want to use a GPU
34
+
35
+ speech = synthesiser("Hola, 驴c贸mo est谩s hoy?")
36
+
37
+ scipy.io.wavfile.write("finetuned_output.wav", rate=speech["sampling_rate"], data=speech["audio"])
38
+ ```
39
+
40
+ ### Transformers.js
41
+
42
+ If you haven't already, you can install the [Transformers.js](https://huggingface.co/docs/transformers.js) JavaScript library from [NPM](https://www.npmjs.com/package/@xenova/transformers) using:
43
+ ```bash
44
+ npm i @xenova/transformers
45
+ ```
46
+
47
+ **Example:** Generate Spanish speech with `ylacombe/mms-spa-finetuned-argentinian-monospeaker`.
48
+ ```js
49
+ import { pipeline } from '@xenova/transformers';
50
+
51
+ // Create a text-to-speech pipeline
52
+ const synthesizer = await pipeline('text-to-speech', 'ylacombe/mms-spa-finetuned-argentinian-monospeaker', {
53
+ quantized: false, // Remove this line to use the quantized version (default)
54
+ });
55
+
56
+ // Generate speech
57
+ const output = await synthesizer('Hola, 驴c贸mo est谩s hoy?');
58
+ console.log(output);
59
+ // {
60
+ // audio: Float32Array(69888) [ ... ],
61
+ // sampling_rate: 16000
62
+ // }
63
+ ```
64
+
65
+ Optionally, save the audio to a wav file (Node.js):
66
+ ```js
67
+ import wavefile from 'wavefile';
68
+ import fs from 'fs';
69
+
70
+ const wav = new wavefile.WaveFile();
71
+ wav.fromScratch(1, output.sampling_rate, '32f', output.audio);
72
+ fs.writeFileSync('out.wav', wav.toBuffer());
73
+ ```
74
+
75
+
76
+ <audio controls src="https://cdn-uploads.huggingface.co/production/uploads/61b253b7ac5ecaae3d1efe0c/6FvN6zFSHGeenWS2-H8xv.wav"></audio>