WenetSpeech-Chuan
Collection
a large-scale open-source corpus with a full processing pipeline and benchmarks for ASR and TTS
•
5 items
•
Updated
WSChuan-ASR
├── paraformer_large_chuan/
│ ├── config.yaml
│ ├── model.pt
│ └── infer.py
│
├── Qwen2.5-omni3B/
| ├──added_tokens.json
| ├──args.json
| ├──char_template.jinja
| ├──config.json
| ├──generation_config.json
| ├──merges.txt
| ├──model-00001-of-00003.safetensors
| ├──model-00002-of-00003.safetensors
| ├──model-00003-of-00003.safetensors
| ├──model.safetensors.index.json
| ├──preprocessor_config.json
| ├──special_tokens_map.json
| ├──spk_dict.pt
| ├──tokenizer_config.json
| ├──tokenizer.json
| ├──video_preprocessor_config.json
| └──vocab.json
│
├── .gitattributes
└── README.md
Model | Model Size | WSC-Eval-ASR - Easy | WSC-Eval-ASR - Hard | WSC-Eval-ASR - Total | Magicdata - Conversation | Magicdata - Daily-Use | Avg. |
---|---|---|---|---|---|---|---|
with LLM | |||||||
Kimi-Audio | 7B | 16.65 | 28.66 | 17.66 | 24.67 | 5.77 | 18.68 |
FireRedASR-LLM | 8.3B | 12.80 | 25.27 | 14.40 | 17.68 | 6.69 | 15.37 |
Qwen2.5-omni | 3B | 16.94 | 26.01 | 18.20 | 20.40 | 6.32 | 17.69 |
Qwen2.5-omni-WSC-Finetune⭐ | 3B | 14.36 | 24.14 | 15.61 | 18.45 | 6.15 | 15.74 |
Qwen2.5-omni+internal data⭐ | 3B | 13.17 | 23.36 | 14.81 | 18.50 | 5.88 | 15.14 |
Qwen2.5-omni-WSC-Finetune + internal data⭐ | 3B | 12.93 | 23.19 | 14.25 | 17.95 | 5.89 | 14.84 |
without LLM | |||||||
SenseVoice-small | 234M | 17.43 | 28.38 | 18.39 | 23.50 | 8.77 | 19.29 |
Whisper | 244M | 52.06 | 63.99 | 53.59 | 55.88 | 52.03 | 55.51 |
FireRedASR-AED | 1.1B | 13.29 | 23.64 | 14.62 | 17.84 | 6.69 | 15.14 |
Paraformer | 220M | 14.34 | 24.61 | 15.66 | 19.81 | 8.16 | 16.52 |
Paraformer-WSC-Finetune⭐ | 220M | 12.15 | 22.60 | 13.51 | 16.60 | 8.02 | 14.58 |
Paraformer + internal data⭐ | 220M | 11.93 | 21.82 | 13.14 | 15.61 | 6.77 | 13.85 |
Paraformer-WSC-Finetune + internal data⭐ | 220M | 11.59 | 21.59 | 12.87 | 14.59 | 6.28 | 13.38 |
export CUDA_VISIBLE_DEVICES=7
root_dir=./test_data
test_sets=("WSC-Eval-ASR" "WSC-Eval-ASR-Hard" "WSC-Eval-ASR-Easy")
model_dir=./model_dir
out_rootdir=./results
mkdir -p $out_rootdir
python infer_paraformer.py \
--model $model_dir \
--wav_scp_file $root_dir/$test_data/wav.scp \
--output_dir $out_rootdir/debug \
--device "cuda" \
--output_file $out_dir/hyp.txt
python infer_qwen2.5omni.py \
--wavs_path /path/to/your/wav.scp \
--out_path /path/to/your/results.txt \
--gpu 0 \
--model /path/to/your/model