anonymoussubmitter222 commited on
Commit
a03fd36
·
1 Parent(s): 1b934f7

added description

Browse files
Files changed (1) hide show
  1. app.py +35 -0
app.py CHANGED
@@ -331,6 +331,41 @@ asr_brain = ASR(
331
  run_opts=run_opts,
332
  checkpointer=hparams["checkpointer"],
333
  )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
334
  asr_brain.device= "cpu"
335
  asr_brain.modules.to("cpu")
336
  asr_brain.tokenizer = label_encoder
 
331
  run_opts=run_opts,
332
  checkpointer=hparams["checkpointer"],
333
  )
334
+ description = """
335
+
336
+ # Global description
337
+
338
+ This is a speechbrain-based Automatic Speech Recognition (ASR) model for Tunisian arabic. It outputs tunisian transcriptions in arabic language. Since the language is unwritten, the transcriptions may vary. This model is the work of Salah Zaiem, PhD candidate, contact : [email protected]
339
+
340
+
341
+ # Pipeline description
342
+ This ASR system is composed of 2 different but linked blocks:
343
+ - Acoustic model (wavlm-large + CTC). A pretrained wavlm-larhe model (https://huggingface.co/microsoft/wavlm-large) is combined with two DNN layers and finetuned on a tunisian arabic dataset.
344
+ - KenLM based 4-gram language model, learned on the training data.
345
+ The obtained final acoustic representation is given to the CTC greedy decoder.
346
+ The system is trained with single channel recordings resampled at 16 khz. (The model should be good with audio resampled from 8khz)
347
+
348
+ #Limitations
349
+ Due to the nature of the available training data, the model may encounter issues when dealing with foreign words. So while it is common for Tunisian speakers to use (mainly french) foreign words, these will lead to more errors, we are working on improving this in further models.
350
+
351
+ Run is done on CPU to keep it free in this space. This leads to quite long running times on long sequences. If for your project or research, you want to transcribe long sequences, feel free to drop an email here : [email protected]
352
+
353
+ # Referencing SpeechBrain
354
+
355
+ This work has no published paper yet, and may never have. If you use it in an academic setting, please cite the original SpeechBrain paper :
356
+ ```
357
+ @misc{SB2021,
358
+ author = {Ravanelli, Mirco and Parcollet, Titouan and Rouhe, Aku and Plantinga, Peter and Rastorgueva, Elena and Lugosch, Loren and Dawalatabad, Nauman and Ju-Chieh, Chou and Heba, Abdel and Grondin, Francois and Aris, William and Liao, Chien-Feng and Cornell, Samuele and Yeh, Sung-Lin and Na, Hwidong and Gao, Yan and Fu, Szu-Wei and Subakan, Cem and De Mori, Renato and Bengio, Yoshua },
359
+ title = {SpeechBrain},
360
+ year = {2021},
361
+ publisher = {GitHub},
362
+ journal = {GitHub repository},
363
+ howpublished = {\\\\url{https://github.com/speechbrain/speechbrain}},
364
+ }
365
+ ```
366
+
367
+
368
+ """
369
  asr_brain.device= "cpu"
370
  asr_brain.modules.to("cpu")
371
  asr_brain.tokenizer = label_encoder