CL ASOct 5, 2021

DistilHuBERT: Speech Representation Learning by Layer-wise Distillation of Hidden-unit BERT

Heng-Jui Chang, Shu-wen Yang, Hung-yi Lee

arXiv:2110.01900v413.4214 citations

Originality Incremental advance

AI Analysis

This work addresses accessibility issues for researchers in academia and small companies by enabling more efficient pre-training of speech models, though it is incremental as it builds on existing HuBERT methods.

The paper tackles the high memory and pre-training costs of self-supervised speech representation models like HuBERT by introducing DistilHuBERT, a layer-wise distillation framework that reduces model size by 75% and speeds up training by 73% while maintaining performance across ten tasks.

Self-supervised speech representation learning methods like wav2vec 2.0 and Hidden-unit BERT (HuBERT) leverage unlabeled speech data for pre-training and offer good representations for numerous speech processing tasks. Despite the success of these methods, they require large memory and high pre-training costs, making them inaccessible for researchers in academia and small companies. Therefore, this paper introduces DistilHuBERT, a novel multi-task learning framework to distill hidden representations from a HuBERT model directly. This method reduces HuBERT's size by 75% and 73% faster while retaining most performance in ten different tasks. Moreover, DistilHuBERT required little training time and data, opening the possibilities of pre-training personal and on-device SSL models for speech.

View on arXiv PDF

Similar