Spaces:
Runtime error
Runtime error
Upload 10 files
Browse files- .gitattributes +1 -0
- MUSTC_ROOT_german/en-de/config_st.yaml +19 -0
- MUSTC_ROOT_german/en-de/data/tst-COMMON/txt/tst-COMMON.de +8 -0
- MUSTC_ROOT_german/en-de/data/tst-COMMON/txt/tst-COMMON.en +8 -0
- MUSTC_ROOT_german/en-de/data/tst-COMMON/txt/tst-COMMON.yaml +8 -0
- MUSTC_ROOT_german/en-de/data/tst-COMMON/wav/ted_1096.wav +3 -0
- MUSTC_ROOT_german/en-de/fbank80.zip +3 -0
- MUSTC_ROOT_german/en-de/spm_unigram8000_st.model +3 -0
- MUSTC_ROOT_german/en-de/spm_unigram8000_st.txt +0 -0
- MUSTC_ROOT_german/en-de/tst-COMMON_st.tsv +9 -0
- app.py +9 -36
.gitattributes
CHANGED
@@ -35,3 +35,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
MUSTC_ROOT_french/en-fr/data/tst-COMMON/wav/ted_1096.wav filter=lfs diff=lfs merge=lfs -text
|
37 |
MUSTC_ROOT_hindi/en-hi/data/tst-COMMON/wav/ted_1096.wav filter=lfs diff=lfs merge=lfs -text
|
|
|
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
MUSTC_ROOT_french/en-fr/data/tst-COMMON/wav/ted_1096.wav filter=lfs diff=lfs merge=lfs -text
|
37 |
MUSTC_ROOT_hindi/en-hi/data/tst-COMMON/wav/ted_1096.wav filter=lfs diff=lfs merge=lfs -text
|
38 |
+
MUSTC_ROOT_german/en-de/data/tst-COMMON/wav/ted_1096.wav filter=lfs diff=lfs merge=lfs -text
|
MUSTC_ROOT_german/en-de/config_st.yaml
ADDED
@@ -0,0 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
bpe_tokenizer:
|
2 |
+
bpe: sentencepiece
|
3 |
+
sentencepiece_model: ./spm_unigram8000_st.model
|
4 |
+
input_channels: 1
|
5 |
+
input_feat_per_channel: 80
|
6 |
+
specaugment:
|
7 |
+
freq_mask_F: 27
|
8 |
+
freq_mask_N: 1
|
9 |
+
time_mask_N: 1
|
10 |
+
time_mask_T: 100
|
11 |
+
time_mask_p: 1.0
|
12 |
+
time_wrap_W: 0
|
13 |
+
transforms:
|
14 |
+
'*':
|
15 |
+
- utterance_cmvn
|
16 |
+
_train:
|
17 |
+
- utterance_cmvn
|
18 |
+
- specaugment
|
19 |
+
vocab_filename: spm_unigram8000_st.txt
|
MUSTC_ROOT_german/en-de/data/tst-COMMON/txt/tst-COMMON.de
ADDED
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Zu Hause in New York, bin ich Chef der Entwicklungsabteilung einer gemeinnützigen Organisation namens Robin Hood.
|
2 |
+
Wenn ich nicht die Armut bekämpfe, bekämpfe ich als Gehilfe eines Feuerwehr-Hauptmanns bei einem freiwilligen Löschzug das Feuer.
|
3 |
+
Nun, in unserer Stadt, in der Freiwillige eine hochqualifizierte Berufsfeuerwehr unterstützten, muss man ziemlich früh an der Brandstelle sein, um mitmischen zu können.
|
4 |
+
Ich erinnere mich an mein erstes Feuer.
|
5 |
+
Ich war der zweite Freiwillige an der Brandstelle, ich hatte also recht gute Chancen hinein zu können.
|
6 |
+
Aber es war immer noch ein Wettrennen gegen die anderen Freiwilligen um den verantwortlichen Hauptmann zu erreichen und herauszufinden was unsere Aufgaben sein würden.
|
7 |
+
Als ich den Hauptmann fand, hatte er gerade in eine sehr ernste Unterhaltung mit der Hausbesitzerin, die sicherlich einen der schlimmsten Tage ihres Lebens hatte.
|
8 |
+
Es war mitten in der Nacht und sie stand im Schlafanzug und barfuß unter einem Schirm draußen im strömenden Regen, während ihr Haus in Flammen stand.
|
MUSTC_ROOT_german/en-de/data/tst-COMMON/txt/tst-COMMON.en
ADDED
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Back in New York, I am the head of development for a non-profit called Robin Hood.
|
2 |
+
When I'm not fighting poverty, I'm fighting fires as the assistant captain of a volunteer fire company.
|
3 |
+
Now in our town, where the volunteers supplement a highly skilled career staff, you have to get to the fire scene pretty early to get in on any action.
|
4 |
+
I remember my first fire.
|
5 |
+
I was the second volunteer on the scene, so there was a pretty good chance I was going to get in.
|
6 |
+
But still it was a real footrace against the other volunteers to get to the captain in charge to find out what our assignments would be.
|
7 |
+
When I found the captain, he was having a very engaging conversation with the homeowner, who was surely having one of the worst days of her life.
|
8 |
+
Here it was, the middle of the night, she was standing outside in the pouring rain, under an umbrella, in her pajamas, barefoot, while her house was in flames.
|
MUSTC_ROOT_german/en-de/data/tst-COMMON/txt/tst-COMMON.yaml
ADDED
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
- {duration: 5.0, offset: 0.0, rW: 17, uW: 0, speaker_id: spk.1096, wav: test.wav}
|
2 |
+
- {duration: 5.160000, offset: 20.290000, rW: 17, uW: 0, speaker_id: spk.1096, wav: ted_1096.wav}
|
3 |
+
- {duration: 8.110000, offset: 25.930000, rW: 29, uW: 0, speaker_id: spk.1096, wav: ted_1096.wav}
|
4 |
+
- {duration: 1.560000, offset: 34.920000, rW: 5, uW: 0, speaker_id: spk.1096, wav: ted_1096.wav}
|
5 |
+
- {duration: 4.180000, offset: 36.730000, rW: 21, uW: 0, speaker_id: spk.1096, wav: ted_1096.wav}
|
6 |
+
- {duration: 5.580000, offset: 41.880000, rW: 26, uW: 0, speaker_id: spk.1096, wav: ted_1096.wav}
|
7 |
+
- {duration: 8.610001, offset: 48.309999, rW: 27, uW: 0, speaker_id: spk.1096, wav: ted_1096.wav}
|
8 |
+
- {duration: 9.680000, offset: 57.510000, rW: 29, uW: 0, speaker_id: spk.1096, wav: ted_1096.wav}
|
MUSTC_ROOT_german/en-de/data/tst-COMMON/wav/ted_1096.wav
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:69a122c3ad89320ec24cad84b622a01f26c3138b3e5869dc033e65bd0ab73fe1
|
3 |
+
size 8990102
|
MUSTC_ROOT_german/en-de/fbank80.zip
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:0bef03a45d7514d5018c4de30d352c736359248e6e8d70d586796aa32b30f4e2
|
3 |
+
size 5242360
|
MUSTC_ROOT_german/en-de/spm_unigram8000_st.model
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:1c1b53333e56d7dbf5adc03ffcbb3760de05def56f35e21e75b5702aeff38098
|
3 |
+
size 379997
|
MUSTC_ROOT_german/en-de/spm_unigram8000_st.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|
MUSTC_ROOT_german/en-de/tst-COMMON_st.tsv
ADDED
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
id audio n_frames tgt_text speaker
|
2 |
+
ted_1096_0 /home/deepakprasad/nlp_code/German_MUSTC/en-de/fbank80.zip:652123021:44928 140 Der Hauptmann winkte mich zu sich. spk.1096
|
3 |
+
ted_1096_1 /home/deepakprasad/nlp_code/German_MUSTC/en-de/fbank80.zip:5396853429:272128 850 Er sagte, "Bezos, Sie müssen in das Haus gehen. Sie müssen nach oben gehen, an dem Feuer vorbei, und müssen dieser Frau ein Paar Schuhe holen." spk.1096
|
4 |
+
ted_1096_2 /home/deepakprasad/nlp_code/German_MUSTC/en-de/fbank80.zip:41827555529:6208 19 (Gelächter) spk.1096
|
5 |
+
ted_1096_3 /home/deepakprasad/nlp_code/German_MUSTC/en-de/fbank80.zip:46434264544:14848 46 Ich schwöre es. spk.1096
|
6 |
+
ted_1096_4 /home/deepakprasad/nlp_code/German_MUSTC/en-de/fbank80.zip:13281223497:447168 1397 Nun, nicht genau das, was ich mir erhofft hatte, doch ich ging los -- die Treppen hoch, den Flur entlang, an den "echten" Feuerwehrmännern vorbei, die zu diesem Zeitpunkt mit dem Löschen schon so ziemlich fertig waren, in das Schlafzimmer um ein Paar Schuhe zu holen. spk.1096
|
7 |
+
ted_1096_5 /home/deepakprasad/nlp_code/German_MUSTC/en-de/fbank80.zip:35969158042:152768 477 Ich weiß was Sie jetzt denken, aber ich bin kein Held. spk.1096
|
8 |
+
ted_1096_6 /home/deepakprasad/nlp_code/German_MUSTC/en-de/fbank80.zip:41927428403:194048 606 Ich trug meine Beute zurück nach unten, wo ich an der Haustür meinen Erzfeind und den geliebten Hund traf. spk.1096
|
9 |
+
ted_1096_7 /home/deepakprasad/nlp_code/German_MUSTC/en-de/fbank80.zip:5964728808:236928 740 Wir trugen unsere Schätze nach draußen zur Hausbesitzerin, wo, nicht überraschend, seiner wesentlich mehr Aufmerksamkeit bekam als meiner. spk.1096
|
app.py
CHANGED
@@ -10,19 +10,7 @@ import sys
|
|
10 |
import os
|
11 |
import subprocess
|
12 |
from pydub import AudioSegment
|
13 |
-
import
|
14 |
-
import wave
|
15 |
-
|
16 |
-
|
17 |
-
|
18 |
-
def get_wav_duration(file_path):
|
19 |
-
with wave.open(file_path, 'rb') as wav_file:
|
20 |
-
frames = wav_file.getnframes()
|
21 |
-
rate = wav_file.getframerate()
|
22 |
-
duration = frames / float(rate)
|
23 |
-
return duration
|
24 |
-
|
25 |
-
|
26 |
|
27 |
def install_fairseq():
|
28 |
try:
|
@@ -57,50 +45,35 @@ def run_my_code(input_text, language):
|
|
57 |
audio=convert_audio_to_16k_wav(input_text)
|
58 |
hi_wav = audio
|
59 |
|
60 |
-
|
61 |
data_root=""
|
62 |
model_checkpoint=""
|
63 |
d_r=""
|
64 |
-
yam=""
|
65 |
|
66 |
if(language=="Hindi"):
|
67 |
model_checkpoint = "./models/hindi_model.pt"
|
68 |
data_root="./MUSTC_ROOT_hindi/en-hi/"
|
69 |
d_r="MUSTC_ROOT_hindi/"
|
70 |
-
yam="./MUSTC_ROOT_hindi/en-hi/data/tst-COMMON/txt/tst-COMMON.yaml"
|
71 |
if(language=="French"):
|
72 |
model_checkpoint = "./models/french_model.pt"
|
73 |
data_root="./MUSTC_ROOT_french/en-fr/"
|
74 |
d_r="MUSTC_ROOT_french/"
|
75 |
-
yam="./MUSTC_ROOT_french/en-fr/data/tst-COMMON/txt/tst-COMMON.yaml"
|
76 |
|
77 |
-
|
78 |
-
with open(yam, 'r') as yaml_file:
|
79 |
-
data = yaml.safe_load(yaml_file)
|
80 |
-
data[0]['duration']=get_wav_duration(hi_wav)
|
81 |
-
with open(yam, 'w') as yaml_file:
|
82 |
-
yaml.dump(data, yaml_file)
|
83 |
|
84 |
os.system(f"cp {hi_wav} {data_root}data/tst-COMMON/wav/test.wav")
|
85 |
|
86 |
-
print("------Starting data prepration
|
87 |
subprocess.run(["python", "prep_mustc_data_hindi_single.py", "--data-root", d_r, "--task", "st", "--vocab-type", "unigram", "--vocab-size", "8000"], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
|
88 |
|
89 |
-
print("------Performing translation
|
90 |
|
91 |
-
translation_result = subprocess.run(["
|
92 |
translation_result_text = translation_result.stdout
|
93 |
|
94 |
lines = translation_result_text.split("\n")
|
95 |
|
96 |
-
|
97 |
-
#just for checking the duration from the yaml file of the current input audio
|
98 |
-
with open(yam, 'r') as yaml_file:
|
99 |
-
data = yaml.safe_load(yaml_file)
|
100 |
-
print(data[0]['duration'], " seconds duration")
|
101 |
-
|
102 |
output_text=""
|
103 |
-
print("\n\n------Translation results are
|
104 |
for i in lines:
|
105 |
if (i.startswith("D-0")):
|
106 |
print(i.split("\t")[2])
|
@@ -121,14 +94,14 @@ install_fairseq()
|
|
121 |
#input_textbox = gr.inputs.Textbox(label="test2.wav")
|
122 |
#input=gr.inputs.Audio(source="microphone", type="filepath", label="Record something (in English)...")
|
123 |
#audio=convert_audio_to_16k_wav(input)
|
124 |
-
output_textbox = gr.outputs.Textbox(label="
|
125 |
|
126 |
# Create a Gradio interface
|
127 |
iface = gr.Interface(
|
128 |
fn=run_my_code,
|
129 |
-
inputs=[gr.inputs.Audio(source="microphone", type="filepath", label="Record something (in
|
130 |
outputs=output_textbox,
|
131 |
-
title="English to Hindi
|
132 |
|
133 |
# Launch the interface
|
134 |
iface.launch()
|
|
|
10 |
import os
|
11 |
import subprocess
|
12 |
from pydub import AudioSegment
|
13 |
+
from huggingface_hub import snapshot_download
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
14 |
|
15 |
def install_fairseq():
|
16 |
try:
|
|
|
45 |
audio=convert_audio_to_16k_wav(input_text)
|
46 |
hi_wav = audio
|
47 |
|
|
|
48 |
data_root=""
|
49 |
model_checkpoint=""
|
50 |
d_r=""
|
|
|
51 |
|
52 |
if(language=="Hindi"):
|
53 |
model_checkpoint = "./models/hindi_model.pt"
|
54 |
data_root="./MUSTC_ROOT_hindi/en-hi/"
|
55 |
d_r="MUSTC_ROOT_hindi/"
|
|
|
56 |
if(language=="French"):
|
57 |
model_checkpoint = "./models/french_model.pt"
|
58 |
data_root="./MUSTC_ROOT_french/en-fr/"
|
59 |
d_r="MUSTC_ROOT_french/"
|
|
|
60 |
|
61 |
+
|
|
|
|
|
|
|
|
|
|
|
62 |
|
63 |
os.system(f"cp {hi_wav} {data_root}data/tst-COMMON/wav/test.wav")
|
64 |
|
65 |
+
print("------Starting data prepration...")
|
66 |
subprocess.run(["python", "prep_mustc_data_hindi_single.py", "--data-root", d_r, "--task", "st", "--vocab-type", "unigram", "--vocab-size", "8000"], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
|
67 |
|
68 |
+
print("------Performing translation...")
|
69 |
|
70 |
+
translation_result = subprocess.run(["fairseq-generate", data_root, "--config-yaml", "config_st.yaml", "--gen-subset", "tst-COMMON_st", "--task", "speech_to_text", "--path", model_checkpoint, "--max-tokens", "50000", "--beam", "5", "--scoring", "sacrebleu"], capture_output=True, text=True)
|
71 |
translation_result_text = translation_result.stdout
|
72 |
|
73 |
lines = translation_result_text.split("\n")
|
74 |
|
|
|
|
|
|
|
|
|
|
|
|
|
75 |
output_text=""
|
76 |
+
print("\n\n------Translation results are:")
|
77 |
for i in lines:
|
78 |
if (i.startswith("D-0")):
|
79 |
print(i.split("\t")[2])
|
|
|
94 |
#input_textbox = gr.inputs.Textbox(label="test2.wav")
|
95 |
#input=gr.inputs.Audio(source="microphone", type="filepath", label="Record something (in English)...")
|
96 |
#audio=convert_audio_to_16k_wav(input)
|
97 |
+
output_textbox = gr.outputs.Textbox(label="Output Text")
|
98 |
|
99 |
# Create a Gradio interface
|
100 |
iface = gr.Interface(
|
101 |
fn=run_my_code,
|
102 |
+
inputs=[gr.inputs.Audio(source="microphone", type="filepath", label="Record something (in English)..."), gr.inputs.Radio(["Hindi", "French"], label="Language")],
|
103 |
outputs=output_textbox,
|
104 |
+
title="English to Hindi Translator")
|
105 |
|
106 |
# Launch the interface
|
107 |
iface.launch()
|