Spaces:

balaramas
/

s2t_translator

Runtime error

App Files Files Community

balaramas commited on Jul 4, 2023

Commit

6cc98e6

1 Parent(s): 22d4e6b

Upload 10 files

Browse files

Files changed (11) hide show

.gitattributes +1 -0
MUSTC_ROOT_german/en-de/config_st.yaml +19 -0
MUSTC_ROOT_german/en-de/data/tst-COMMON/txt/tst-COMMON.de +8 -0
MUSTC_ROOT_german/en-de/data/tst-COMMON/txt/tst-COMMON.en +8 -0
MUSTC_ROOT_german/en-de/data/tst-COMMON/txt/tst-COMMON.yaml +8 -0
MUSTC_ROOT_german/en-de/data/tst-COMMON/wav/ted_1096.wav +3 -0
MUSTC_ROOT_german/en-de/fbank80.zip +3 -0
MUSTC_ROOT_german/en-de/spm_unigram8000_st.model +3 -0
MUSTC_ROOT_german/en-de/spm_unigram8000_st.txt +0 -0
MUSTC_ROOT_german/en-de/tst-COMMON_st.tsv +9 -0
app.py +9 -36

.gitattributes CHANGED Viewed

@@ -35,3 +35,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 MUSTC_ROOT_french/en-fr/data/tst-COMMON/wav/ted_1096.wav filter=lfs diff=lfs merge=lfs -text
 MUSTC_ROOT_hindi/en-hi/data/tst-COMMON/wav/ted_1096.wav filter=lfs diff=lfs merge=lfs -text

 *tfevents* filter=lfs diff=lfs merge=lfs -text
 MUSTC_ROOT_french/en-fr/data/tst-COMMON/wav/ted_1096.wav filter=lfs diff=lfs merge=lfs -text
 MUSTC_ROOT_hindi/en-hi/data/tst-COMMON/wav/ted_1096.wav filter=lfs diff=lfs merge=lfs -text
+MUSTC_ROOT_german/en-de/data/tst-COMMON/wav/ted_1096.wav filter=lfs diff=lfs merge=lfs -text

MUSTC_ROOT_german/en-de/config_st.yaml ADDED Viewed

	@@ -0,0 +1,19 @@

+bpe_tokenizer:
+  bpe: sentencepiece
+  sentencepiece_model: ./spm_unigram8000_st.model
+input_channels: 1
+input_feat_per_channel: 80
+specaugment:
+  freq_mask_F: 27
+  freq_mask_N: 1
+  time_mask_N: 1
+  time_mask_T: 100
+  time_mask_p: 1.0
+  time_wrap_W: 0
+transforms:
+  '*':
+  - utterance_cmvn
+  _train:
+  - utterance_cmvn
+  - specaugment
+vocab_filename: spm_unigram8000_st.txt

MUSTC_ROOT_german/en-de/data/tst-COMMON/txt/tst-COMMON.de ADDED Viewed

	@@ -0,0 +1,8 @@

+Zu Hause in New York, bin ich Chef der Entwicklungsabteilung einer gemeinnützigen Organisation namens Robin Hood.
+Wenn ich nicht die Armut bekämpfe, bekämpfe ich als Gehilfe eines Feuerwehr-Hauptmanns bei einem freiwilligen Löschzug das Feuer.
+Nun, in unserer Stadt, in der Freiwillige eine hochqualifizierte Berufsfeuerwehr unterstützten, muss man ziemlich früh an der Brandstelle sein, um mitmischen zu können.
+Ich erinnere mich an mein erstes Feuer.
+Ich war der zweite Freiwillige an der Brandstelle, ich hatte also recht gute Chancen hinein zu können.
+Aber es war immer noch ein Wettrennen gegen die anderen Freiwilligen um den verantwortlichen Hauptmann zu erreichen und herauszufinden was unsere Aufgaben sein würden.
+Als ich den Hauptmann fand, hatte er gerade in eine sehr ernste Unterhaltung mit der Hausbesitzerin, die sicherlich einen der schlimmsten Tage ihres Lebens hatte.
+Es war mitten in der Nacht und sie stand im Schlafanzug und barfuß unter einem Schirm draußen im strömenden Regen, während ihr Haus in Flammen stand.

MUSTC_ROOT_german/en-de/data/tst-COMMON/txt/tst-COMMON.en ADDED Viewed

	@@ -0,0 +1,8 @@

+Back in New York, I am the head of development for a non-profit called Robin Hood.
+When I'm not fighting poverty, I'm fighting fires as the assistant captain of a volunteer fire company.
+Now in our town, where the volunteers supplement a highly skilled career staff, you have to get to the fire scene pretty early to get in on any action.
+I remember my first fire.
+I was the second volunteer on the scene, so there was a pretty good chance I was going to get in.
+But still it was a real footrace against the other volunteers to get to the captain in charge to find out what our assignments would be.
+When I found the captain, he was having a very engaging conversation with the homeowner, who was surely having one of the worst days of her life.
+Here it was, the middle of the night, she was standing outside in the pouring rain, under an umbrella, in her pajamas, barefoot, while her house was in flames.

MUSTC_ROOT_german/en-de/data/tst-COMMON/txt/tst-COMMON.yaml ADDED Viewed

	@@ -0,0 +1,8 @@

+- {duration: 5.0, offset: 0.0, rW: 17, uW: 0, speaker_id: spk.1096, wav: test.wav}
+- {duration: 5.160000, offset: 20.290000, rW: 17, uW: 0, speaker_id: spk.1096, wav: ted_1096.wav}
+- {duration: 8.110000, offset: 25.930000, rW: 29, uW: 0, speaker_id: spk.1096, wav: ted_1096.wav}
+- {duration: 1.560000, offset: 34.920000, rW: 5, uW: 0, speaker_id: spk.1096, wav: ted_1096.wav}
+- {duration: 4.180000, offset: 36.730000, rW: 21, uW: 0, speaker_id: spk.1096, wav: ted_1096.wav}
+- {duration: 5.580000, offset: 41.880000, rW: 26, uW: 0, speaker_id: spk.1096, wav: ted_1096.wav}
+- {duration: 8.610001, offset: 48.309999, rW: 27, uW: 0, speaker_id: spk.1096, wav: ted_1096.wav}
+- {duration: 9.680000, offset: 57.510000, rW: 29, uW: 0, speaker_id: spk.1096, wav: ted_1096.wav}

MUSTC_ROOT_german/en-de/data/tst-COMMON/wav/ted_1096.wav ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:69a122c3ad89320ec24cad84b622a01f26c3138b3e5869dc033e65bd0ab73fe1
+size 8990102

MUSTC_ROOT_german/en-de/fbank80.zip ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0bef03a45d7514d5018c4de30d352c736359248e6e8d70d586796aa32b30f4e2
+size 5242360

MUSTC_ROOT_german/en-de/spm_unigram8000_st.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1c1b53333e56d7dbf5adc03ffcbb3760de05def56f35e21e75b5702aeff38098
+size 379997

MUSTC_ROOT_german/en-de/spm_unigram8000_st.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

MUSTC_ROOT_german/en-de/tst-COMMON_st.tsv ADDED Viewed

	@@ -0,0 +1,9 @@

+id	audio	n_frames	tgt_text	speaker
+ted_1096_0	/home/deepakprasad/nlp_code/German_MUSTC/en-de/fbank80.zip:652123021:44928	140	Der Hauptmann winkte mich zu sich.	spk.1096
+ted_1096_1	/home/deepakprasad/nlp_code/German_MUSTC/en-de/fbank80.zip:5396853429:272128	850	Er sagte, "Bezos, Sie müssen in das Haus gehen. Sie müssen nach oben gehen, an dem Feuer vorbei, und müssen dieser Frau ein Paar Schuhe holen."	spk.1096
+ted_1096_2	/home/deepakprasad/nlp_code/German_MUSTC/en-de/fbank80.zip:41827555529:6208	19	(Gelächter)	spk.1096
+ted_1096_3	/home/deepakprasad/nlp_code/German_MUSTC/en-de/fbank80.zip:46434264544:14848	46	Ich schwöre es.	spk.1096
+ted_1096_4	/home/deepakprasad/nlp_code/German_MUSTC/en-de/fbank80.zip:13281223497:447168	1397	Nun, nicht genau das, was ich mir erhofft hatte, doch ich ging los -- die Treppen hoch, den Flur entlang, an den "echten" Feuerwehrmännern vorbei, die zu diesem Zeitpunkt mit dem Löschen schon so ziemlich fertig waren, in das Schlafzimmer um ein Paar Schuhe zu holen.	spk.1096
+ted_1096_5	/home/deepakprasad/nlp_code/German_MUSTC/en-de/fbank80.zip:35969158042:152768	477	Ich weiß was Sie jetzt denken, aber ich bin kein Held.	spk.1096
+ted_1096_6	/home/deepakprasad/nlp_code/German_MUSTC/en-de/fbank80.zip:41927428403:194048	606	Ich trug meine Beute zurück nach unten, wo ich an der Haustür meinen Erzfeind und den geliebten Hund traf.	spk.1096
+ted_1096_7	/home/deepakprasad/nlp_code/German_MUSTC/en-de/fbank80.zip:5964728808:236928	740	Wir trugen unsere Schätze nach draußen zur Hausbesitzerin, wo, nicht überraschend, seiner wesentlich mehr Aufmerksamkeit bekam als meiner.	spk.1096

app.py CHANGED Viewed

@@ -10,19 +10,7 @@ import sys
 import os
 import subprocess
 from pydub import AudioSegment
-import yaml
-import wave
-def get_wav_duration(file_path):
-    with wave.open(file_path, 'rb') as wav_file:
-        frames = wav_file.getnframes()
-        rate = wav_file.getframerate()
-        duration = frames / float(rate)
-        return duration
 def install_fairseq():
     try:
@@ -57,50 +45,35 @@ def run_my_code(input_text, language):
     audio=convert_audio_to_16k_wav(input_text)
     hi_wav = audio
     data_root=""
     model_checkpoint=""
     d_r=""
-    yam=""
     if(language=="Hindi"):
         model_checkpoint = "./models/hindi_model.pt"
         data_root="./MUSTC_ROOT_hindi/en-hi/"
         d_r="MUSTC_ROOT_hindi/"
-        yam="./MUSTC_ROOT_hindi/en-hi/data/tst-COMMON/txt/tst-COMMON.yaml"
     if(language=="French"):
         model_checkpoint = "./models/french_model.pt"
         data_root="./MUSTC_ROOT_french/en-fr/"
         d_r="MUSTC_ROOT_french/"
-        yam="./MUSTC_ROOT_french/en-fr/data/tst-COMMON/txt/tst-COMMON.yaml"
-    #code to change the duration of the yaml file accordign to the audio input
-    with open(yam, 'r') as yaml_file:
-        data = yaml.safe_load(yaml_file)
-    data[0]['duration']=get_wav_duration(hi_wav)
-    with open(yam, 'w') as yaml_file:
-        yaml.dump(data, yaml_file)
     os.system(f"cp {hi_wav} {data_root}data/tst-COMMON/wav/test.wav")
-    print("------Starting data prepration------")
     subprocess.run(["python", "prep_mustc_data_hindi_single.py", "--data-root", d_r, "--task", "st", "--vocab-type", "unigram", "--vocab-size", "8000"], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
-    print("------Performing translation------")
-    translation_result = subprocess.run(["python", "generate.py", data_root, "--config-yaml", "config_st.yaml", "--gen-subset", "tst-COMMON_st", "--task", "speech_to_text", "--path", model_checkpoint], capture_output=True, text=True)
     translation_result_text = translation_result.stdout
     lines = translation_result_text.split("\n")
-    #just for checking the duration from the yaml file of the current input audio
-    with open(yam, 'r') as yaml_file:
-        data = yaml.safe_load(yaml_file)
-    print(data[0]['duration'], " seconds duration")
     output_text=""
-    print("\n\n------Translation results are:\n")
     for i in lines:
         if (i.startswith("D-0")):
             print(i.split("\t")[2])
@@ -121,14 +94,14 @@ install_fairseq()
 #input_textbox = gr.inputs.Textbox(label="test2.wav")
 #input=gr.inputs.Audio(source="microphone", type="filepath", label="Record something (in English)...")
 #audio=convert_audio_to_16k_wav(input)
-output_textbox = gr.outputs.Textbox(label="The Translated Text is:")
 # Create a Gradio interface
 iface = gr.Interface(
         fn=run_my_code,
-        inputs=[gr.inputs.Audio(source="microphone", type="filepath", label="Record something (in American/British English Accent)..."), gr.inputs.Radio(["Hindi", "French"], label="Language")],
         outputs=output_textbox,
-        title="English to Hindi/French Translator")
 # Launch the interface
 iface.launch()

 import os
 import subprocess
 from pydub import AudioSegment
+from huggingface_hub import snapshot_download
 def install_fairseq():
     try:
     audio=convert_audio_to_16k_wav(input_text)
     hi_wav = audio
     data_root=""
     model_checkpoint=""
     d_r=""
     if(language=="Hindi"):
         model_checkpoint = "./models/hindi_model.pt"
         data_root="./MUSTC_ROOT_hindi/en-hi/"
         d_r="MUSTC_ROOT_hindi/"
     if(language=="French"):
         model_checkpoint = "./models/french_model.pt"
         data_root="./MUSTC_ROOT_french/en-fr/"
         d_r="MUSTC_ROOT_french/"
     os.system(f"cp {hi_wav} {data_root}data/tst-COMMON/wav/test.wav")
+    print("------Starting data prepration...")
     subprocess.run(["python", "prep_mustc_data_hindi_single.py", "--data-root", d_r, "--task", "st", "--vocab-type", "unigram", "--vocab-size", "8000"], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
+    print("------Performing translation...")
+    translation_result = subprocess.run(["fairseq-generate", data_root, "--config-yaml", "config_st.yaml", "--gen-subset", "tst-COMMON_st", "--task", "speech_to_text", "--path", model_checkpoint, "--max-tokens", "50000", "--beam", "5", "--scoring", "sacrebleu"], capture_output=True, text=True)
     translation_result_text = translation_result.stdout
     lines = translation_result_text.split("\n")
     output_text=""
+    print("\n\n------Translation results are:")
     for i in lines:
         if (i.startswith("D-0")):
             print(i.split("\t")[2])
 #input_textbox = gr.inputs.Textbox(label="test2.wav")
 #input=gr.inputs.Audio(source="microphone", type="filepath", label="Record something (in English)...")
 #audio=convert_audio_to_16k_wav(input)
+output_textbox = gr.outputs.Textbox(label="Output Text")
 # Create a Gradio interface
 iface = gr.Interface(
         fn=run_my_code,
+        inputs=[gr.inputs.Audio(source="microphone", type="filepath", label="Record something (in English)..."), gr.inputs.Radio(["Hindi", "French"], label="Language")],
         outputs=output_textbox,
+        title="English to Hindi Translator")
 # Launch the interface
 iface.launch()