Papers
arxiv:2110.01900

DistilHuBERT: Speech Representation Learning by Layer-wise Distillation of Hidden-unit BERT

Published on Oct 5, 2021
Authors:
,
,

Abstract

Self-supervised speech representation learning methods like wav2vec 2.0 and Hidden-unit BERT (HuBERT) leverage unlabeled speech data for pre-training and offer good representations for numerous speech processing tasks. Despite the success of these methods, they require large memory and high pre-training costs, making them inaccessible for researchers in academia and small companies. Therefore, this paper introduces Distil<PRE_TAG>HuBERT</POST_TAG>, a novel multi-task learning framework to distill hidden representations from a HuBERT model directly. This method reduces HuBERT's size by 75% and 73% faster while retaining most performance in ten different tasks. Moreover, Distil<PRE_TAG>HuBERT</POST_TAG> required little training time and data, opening the possibilities of pre-training personal and on-device SSL models for speech.

Community

Sign up or log in to comment

Models citing this paper 6

Browse 6 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2110.01900 in a dataset README.md to link it from this page.

Spaces citing this paper 5

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.