amupd's picture
SpeechT5 upload
history blame
4.97 kB


YiTrans (IWSLT 2022): The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline Shared Task Code is being merged to this repository, thanks for your attention


git clone
git submodule update --init YiTrans/fairseq
cd YiTrans/fairseq
pip install -e .

Data Preparation

Speech/ASR data for pre-training

Please follow the steps of data preparation for HuBERT in here.

Monolingual text data for pre-training

Please follow the steps of data preparation for mBART in here. We reuse the multilingual vocabulary. After getting your subset.{idx,bin} files ready, renaming them as subset.lang.lang.{idx,bin}, e.g.


Bilingual text data for pre-training

The same way of preparing monolingual data with only the difference that you should prepare for both the source language and the target languages. Renaming them as subset.src-tgt.{src,tgt}.{idx,bin}, e.g.


ST data for fine-tuning

Please follow the steps of data preparation for S2T tasks here. Your tsv file should be like this:

id      audio   n_frames        tgt_text        speaker src_text        src_lang        tgt_lang
ted_1_0 /mnt/speechdata/MUSTC/en-de/flac/ted_1_0.flac    25920   Hinter mir war gar keine Autokolonne.   spk.1   There was no motorcade back there.      en_XX   de_DE
ted_1_1 /mnt/speechdata/MUSTC/en-de/flac/ted_1_1.flac    219359  Haben Sie schon mal vom Phantomschmerz gehört? (Lachen) Wir saßen in einem gemieteten Ford Taurus.       spk.1   (Laughter) You've heard of phantom limb pain? (Laughter)        en_XX   de_DE
ted_1_2 /mnt/speechdata/MUSTC/en-de/flac/ted_1_2.flac    71360   Es war Zeit zum Abendessen und wir hielten Ausschau nach einem Restaurant.      spk.1   It was dinnertime, and we started looking for a place to eat.    en_XX   de_DE


For example of pre-training the PT36 model, please follow these steps:

Step 0: Download the released Hubert model and mBART model model.

Step 1: Pre-training with unlabeled speech data and monolingual/bilingual text data

bash YiTrans/exp_scripts/pretrain/

Step 2: Pre-training with ASR dat and domain-filtered bilingual text data

bash YiTrans/exp_scripts/pretrain/

Other configurations like training PT48 can also be fould in ./YiTrans/exp_scripts/pretrain, you might need to modify the PATH variables in .sh files to adjust your data.


For example of pre-training En-De ST model on MuST-C dataset,

bash YiTrans/exp_scripts/finetune_ST/en-de/

Other configurations like different translation directions or datasets could be found in ./YiTrans/exp_scripts/finetune_ST, you might need to modify the PATH variables in .sh files to adjust your data.

Cascaded system

You can also build a cascaded ST system (ASR+MT) with our codebase.

  1. ASR model: fine-tune from the cascade of Hubert Large and mBART model:

    # change the mbart_path/hubert_path to your own in the *.sh
    bash YiTrans/exp_scripts/finetune_ASR/

    Check the .sh file for more information about the configuration.

  2. MT model: fine-tune from mBART model:

    # change the mbart_path to your own in the *.sh
    bash YiTrans/exp_scripts/finetune_MT/

    Check the .sh file for more information about the configuration.


If you find our work is useful in your research, please cite the following paper:

  title   = {The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline Shared Task},
  author  = {Zhang, Ziqiang and Ao, Junyi and Zhou, Long and Liu, Shujie and Wei, Furu and Li, Jinyu},