ASR-LLM Group: Generative Error Correction

community

Activity Feed

AI & ML interests

LLM for Text based Speech Processing

Recent Activity

huckiyang authored a paper 2 days ago

SAKE: Towards Editing Auditory Attribute Knowledge of Large Audio-Language Models

huckiyang authored a paper 2 days ago

Investigating Safety Vulnerabilities of Large Audio-Language Models Under Speaker Emotional Variations

huckiyang authored a paper 2 days ago

Extending Automatic Machine Translation Evaluation to Book-Length Documents

View all activity

Organization Card

Community About org cards

GenSEC: Text-based Generative Audio & Speech Recognition with Cascaded ASR-LLMs
- Task 1: ASR N-best hypotheses correction
- Task 2: Speaker Tagging from N-best hypotheses
- Task 3: Emotion Recognition from N-best hypotheses
Open Source Model
- Llama-7b pre-training for ASR correction
  - https://huggingface.co/GenSEC-LLM/SLT-Task1-Llama2-7b-HyPo-baseline
IEEE SLT 2024, References Paper. See below resources for baseline models and datasets.

@inproceedings{yang2024large,
  title={Large language model based generative error correction: A challenge and baselines for speech recognition, speaker tagging, and emotion recognition},
  author={Yang, Chao-Han Huck and Park, Taejin and Gong, Yuan and Li, Yuanchao and Chen, Zhehuai and Lin, Yen-Ting and Chen, Chen and Hu, Yuchen and Dhawan, Kunal and {\.Z}elasko, Piotr and others},
  booktitle={2024 IEEE Spoken Language Technology Workshop (SLT)},
  pages={371--378},
  year={2024},
  organization={IEEE}
}