balaramas commited on
Commit
6cc98e6
1 Parent(s): 22d4e6b

Upload 10 files

Browse files
.gitattributes CHANGED
@@ -35,3 +35,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  MUSTC_ROOT_french/en-fr/data/tst-COMMON/wav/ted_1096.wav filter=lfs diff=lfs merge=lfs -text
37
  MUSTC_ROOT_hindi/en-hi/data/tst-COMMON/wav/ted_1096.wav filter=lfs diff=lfs merge=lfs -text
 
 
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  MUSTC_ROOT_french/en-fr/data/tst-COMMON/wav/ted_1096.wav filter=lfs diff=lfs merge=lfs -text
37
  MUSTC_ROOT_hindi/en-hi/data/tst-COMMON/wav/ted_1096.wav filter=lfs diff=lfs merge=lfs -text
38
+ MUSTC_ROOT_german/en-de/data/tst-COMMON/wav/ted_1096.wav filter=lfs diff=lfs merge=lfs -text
MUSTC_ROOT_german/en-de/config_st.yaml ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ bpe_tokenizer:
2
+ bpe: sentencepiece
3
+ sentencepiece_model: ./spm_unigram8000_st.model
4
+ input_channels: 1
5
+ input_feat_per_channel: 80
6
+ specaugment:
7
+ freq_mask_F: 27
8
+ freq_mask_N: 1
9
+ time_mask_N: 1
10
+ time_mask_T: 100
11
+ time_mask_p: 1.0
12
+ time_wrap_W: 0
13
+ transforms:
14
+ '*':
15
+ - utterance_cmvn
16
+ _train:
17
+ - utterance_cmvn
18
+ - specaugment
19
+ vocab_filename: spm_unigram8000_st.txt
MUSTC_ROOT_german/en-de/data/tst-COMMON/txt/tst-COMMON.de ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ Zu Hause in New York, bin ich Chef der Entwicklungsabteilung einer gemeinnützigen Organisation namens Robin Hood.
2
+ Wenn ich nicht die Armut bekämpfe, bekämpfe ich als Gehilfe eines Feuerwehr-Hauptmanns bei einem freiwilligen Löschzug das Feuer.
3
+ Nun, in unserer Stadt, in der Freiwillige eine hochqualifizierte Berufsfeuerwehr unterstützten, muss man ziemlich früh an der Brandstelle sein, um mitmischen zu können.
4
+ Ich erinnere mich an mein erstes Feuer.
5
+ Ich war der zweite Freiwillige an der Brandstelle, ich hatte also recht gute Chancen hinein zu können.
6
+ Aber es war immer noch ein Wettrennen gegen die anderen Freiwilligen um den verantwortlichen Hauptmann zu erreichen und herauszufinden was unsere Aufgaben sein würden.
7
+ Als ich den Hauptmann fand, hatte er gerade in eine sehr ernste Unterhaltung mit der Hausbesitzerin, die sicherlich einen der schlimmsten Tage ihres Lebens hatte.
8
+ Es war mitten in der Nacht und sie stand im Schlafanzug und barfuß unter einem Schirm draußen im strömenden Regen, während ihr Haus in Flammen stand.
MUSTC_ROOT_german/en-de/data/tst-COMMON/txt/tst-COMMON.en ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ Back in New York, I am the head of development for a non-profit called Robin Hood.
2
+ When I'm not fighting poverty, I'm fighting fires as the assistant captain of a volunteer fire company.
3
+ Now in our town, where the volunteers supplement a highly skilled career staff, you have to get to the fire scene pretty early to get in on any action.
4
+ I remember my first fire.
5
+ I was the second volunteer on the scene, so there was a pretty good chance I was going to get in.
6
+ But still it was a real footrace against the other volunteers to get to the captain in charge to find out what our assignments would be.
7
+ When I found the captain, he was having a very engaging conversation with the homeowner, who was surely having one of the worst days of her life.
8
+ Here it was, the middle of the night, she was standing outside in the pouring rain, under an umbrella, in her pajamas, barefoot, while her house was in flames.
MUSTC_ROOT_german/en-de/data/tst-COMMON/txt/tst-COMMON.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ - {duration: 5.0, offset: 0.0, rW: 17, uW: 0, speaker_id: spk.1096, wav: test.wav}
2
+ - {duration: 5.160000, offset: 20.290000, rW: 17, uW: 0, speaker_id: spk.1096, wav: ted_1096.wav}
3
+ - {duration: 8.110000, offset: 25.930000, rW: 29, uW: 0, speaker_id: spk.1096, wav: ted_1096.wav}
4
+ - {duration: 1.560000, offset: 34.920000, rW: 5, uW: 0, speaker_id: spk.1096, wav: ted_1096.wav}
5
+ - {duration: 4.180000, offset: 36.730000, rW: 21, uW: 0, speaker_id: spk.1096, wav: ted_1096.wav}
6
+ - {duration: 5.580000, offset: 41.880000, rW: 26, uW: 0, speaker_id: spk.1096, wav: ted_1096.wav}
7
+ - {duration: 8.610001, offset: 48.309999, rW: 27, uW: 0, speaker_id: spk.1096, wav: ted_1096.wav}
8
+ - {duration: 9.680000, offset: 57.510000, rW: 29, uW: 0, speaker_id: spk.1096, wav: ted_1096.wav}
MUSTC_ROOT_german/en-de/data/tst-COMMON/wav/ted_1096.wav ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:69a122c3ad89320ec24cad84b622a01f26c3138b3e5869dc033e65bd0ab73fe1
3
+ size 8990102
MUSTC_ROOT_german/en-de/fbank80.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0bef03a45d7514d5018c4de30d352c736359248e6e8d70d586796aa32b30f4e2
3
+ size 5242360
MUSTC_ROOT_german/en-de/spm_unigram8000_st.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1c1b53333e56d7dbf5adc03ffcbb3760de05def56f35e21e75b5702aeff38098
3
+ size 379997
MUSTC_ROOT_german/en-de/spm_unigram8000_st.txt ADDED
The diff for this file is too large to render. See raw diff
 
MUSTC_ROOT_german/en-de/tst-COMMON_st.tsv ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ id audio n_frames tgt_text speaker
2
+ ted_1096_0 /home/deepakprasad/nlp_code/German_MUSTC/en-de/fbank80.zip:652123021:44928 140 Der Hauptmann winkte mich zu sich. spk.1096
3
+ ted_1096_1 /home/deepakprasad/nlp_code/German_MUSTC/en-de/fbank80.zip:5396853429:272128 850 Er sagte, "Bezos, Sie müssen in das Haus gehen. Sie müssen nach oben gehen, an dem Feuer vorbei, und müssen dieser Frau ein Paar Schuhe holen." spk.1096
4
+ ted_1096_2 /home/deepakprasad/nlp_code/German_MUSTC/en-de/fbank80.zip:41827555529:6208 19 (Gelächter) spk.1096
5
+ ted_1096_3 /home/deepakprasad/nlp_code/German_MUSTC/en-de/fbank80.zip:46434264544:14848 46 Ich schwöre es. spk.1096
6
+ ted_1096_4 /home/deepakprasad/nlp_code/German_MUSTC/en-de/fbank80.zip:13281223497:447168 1397 Nun, nicht genau das, was ich mir erhofft hatte, doch ich ging los -- die Treppen hoch, den Flur entlang, an den "echten" Feuerwehrmännern vorbei, die zu diesem Zeitpunkt mit dem Löschen schon so ziemlich fertig waren, in das Schlafzimmer um ein Paar Schuhe zu holen. spk.1096
7
+ ted_1096_5 /home/deepakprasad/nlp_code/German_MUSTC/en-de/fbank80.zip:35969158042:152768 477 Ich weiß was Sie jetzt denken, aber ich bin kein Held. spk.1096
8
+ ted_1096_6 /home/deepakprasad/nlp_code/German_MUSTC/en-de/fbank80.zip:41927428403:194048 606 Ich trug meine Beute zurück nach unten, wo ich an der Haustür meinen Erzfeind und den geliebten Hund traf. spk.1096
9
+ ted_1096_7 /home/deepakprasad/nlp_code/German_MUSTC/en-de/fbank80.zip:5964728808:236928 740 Wir trugen unsere Schätze nach draußen zur Hausbesitzerin, wo, nicht überraschend, seiner wesentlich mehr Aufmerksamkeit bekam als meiner. spk.1096
app.py CHANGED
@@ -10,19 +10,7 @@ import sys
10
  import os
11
  import subprocess
12
  from pydub import AudioSegment
13
- import yaml
14
- import wave
15
-
16
-
17
-
18
- def get_wav_duration(file_path):
19
- with wave.open(file_path, 'rb') as wav_file:
20
- frames = wav_file.getnframes()
21
- rate = wav_file.getframerate()
22
- duration = frames / float(rate)
23
- return duration
24
-
25
-
26
 
27
  def install_fairseq():
28
  try:
@@ -57,50 +45,35 @@ def run_my_code(input_text, language):
57
  audio=convert_audio_to_16k_wav(input_text)
58
  hi_wav = audio
59
 
60
-
61
  data_root=""
62
  model_checkpoint=""
63
  d_r=""
64
- yam=""
65
 
66
  if(language=="Hindi"):
67
  model_checkpoint = "./models/hindi_model.pt"
68
  data_root="./MUSTC_ROOT_hindi/en-hi/"
69
  d_r="MUSTC_ROOT_hindi/"
70
- yam="./MUSTC_ROOT_hindi/en-hi/data/tst-COMMON/txt/tst-COMMON.yaml"
71
  if(language=="French"):
72
  model_checkpoint = "./models/french_model.pt"
73
  data_root="./MUSTC_ROOT_french/en-fr/"
74
  d_r="MUSTC_ROOT_french/"
75
- yam="./MUSTC_ROOT_french/en-fr/data/tst-COMMON/txt/tst-COMMON.yaml"
76
 
77
- #code to change the duration of the yaml file accordign to the audio input
78
- with open(yam, 'r') as yaml_file:
79
- data = yaml.safe_load(yaml_file)
80
- data[0]['duration']=get_wav_duration(hi_wav)
81
- with open(yam, 'w') as yaml_file:
82
- yaml.dump(data, yaml_file)
83
 
84
  os.system(f"cp {hi_wav} {data_root}data/tst-COMMON/wav/test.wav")
85
 
86
- print("------Starting data prepration------")
87
  subprocess.run(["python", "prep_mustc_data_hindi_single.py", "--data-root", d_r, "--task", "st", "--vocab-type", "unigram", "--vocab-size", "8000"], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
88
 
89
- print("------Performing translation------")
90
 
91
- translation_result = subprocess.run(["python", "generate.py", data_root, "--config-yaml", "config_st.yaml", "--gen-subset", "tst-COMMON_st", "--task", "speech_to_text", "--path", model_checkpoint], capture_output=True, text=True)
92
  translation_result_text = translation_result.stdout
93
 
94
  lines = translation_result_text.split("\n")
95
 
96
-
97
- #just for checking the duration from the yaml file of the current input audio
98
- with open(yam, 'r') as yaml_file:
99
- data = yaml.safe_load(yaml_file)
100
- print(data[0]['duration'], " seconds duration")
101
-
102
  output_text=""
103
- print("\n\n------Translation results are:\n")
104
  for i in lines:
105
  if (i.startswith("D-0")):
106
  print(i.split("\t")[2])
@@ -121,14 +94,14 @@ install_fairseq()
121
  #input_textbox = gr.inputs.Textbox(label="test2.wav")
122
  #input=gr.inputs.Audio(source="microphone", type="filepath", label="Record something (in English)...")
123
  #audio=convert_audio_to_16k_wav(input)
124
- output_textbox = gr.outputs.Textbox(label="The Translated Text is:")
125
 
126
  # Create a Gradio interface
127
  iface = gr.Interface(
128
  fn=run_my_code,
129
- inputs=[gr.inputs.Audio(source="microphone", type="filepath", label="Record something (in American/British English Accent)..."), gr.inputs.Radio(["Hindi", "French"], label="Language")],
130
  outputs=output_textbox,
131
- title="English to Hindi/French Translator")
132
 
133
  # Launch the interface
134
  iface.launch()
 
10
  import os
11
  import subprocess
12
  from pydub import AudioSegment
13
+ from huggingface_hub import snapshot_download
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
  def install_fairseq():
16
  try:
 
45
  audio=convert_audio_to_16k_wav(input_text)
46
  hi_wav = audio
47
 
 
48
  data_root=""
49
  model_checkpoint=""
50
  d_r=""
 
51
 
52
  if(language=="Hindi"):
53
  model_checkpoint = "./models/hindi_model.pt"
54
  data_root="./MUSTC_ROOT_hindi/en-hi/"
55
  d_r="MUSTC_ROOT_hindi/"
 
56
  if(language=="French"):
57
  model_checkpoint = "./models/french_model.pt"
58
  data_root="./MUSTC_ROOT_french/en-fr/"
59
  d_r="MUSTC_ROOT_french/"
 
60
 
61
+
 
 
 
 
 
62
 
63
  os.system(f"cp {hi_wav} {data_root}data/tst-COMMON/wav/test.wav")
64
 
65
+ print("------Starting data prepration...")
66
  subprocess.run(["python", "prep_mustc_data_hindi_single.py", "--data-root", d_r, "--task", "st", "--vocab-type", "unigram", "--vocab-size", "8000"], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
67
 
68
+ print("------Performing translation...")
69
 
70
+ translation_result = subprocess.run(["fairseq-generate", data_root, "--config-yaml", "config_st.yaml", "--gen-subset", "tst-COMMON_st", "--task", "speech_to_text", "--path", model_checkpoint, "--max-tokens", "50000", "--beam", "5", "--scoring", "sacrebleu"], capture_output=True, text=True)
71
  translation_result_text = translation_result.stdout
72
 
73
  lines = translation_result_text.split("\n")
74
 
 
 
 
 
 
 
75
  output_text=""
76
+ print("\n\n------Translation results are:")
77
  for i in lines:
78
  if (i.startswith("D-0")):
79
  print(i.split("\t")[2])
 
94
  #input_textbox = gr.inputs.Textbox(label="test2.wav")
95
  #input=gr.inputs.Audio(source="microphone", type="filepath", label="Record something (in English)...")
96
  #audio=convert_audio_to_16k_wav(input)
97
+ output_textbox = gr.outputs.Textbox(label="Output Text")
98
 
99
  # Create a Gradio interface
100
  iface = gr.Interface(
101
  fn=run_my_code,
102
+ inputs=[gr.inputs.Audio(source="microphone", type="filepath", label="Record something (in English)..."), gr.inputs.Radio(["Hindi", "French"], label="Language")],
103
  outputs=output_textbox,
104
+ title="English to Hindi Translator")
105
 
106
  # Launch the interface
107
  iface.launch()