CL SD ASApr 13, 2022

HuBERT-EE: Early Exiting HuBERT for Efficient Speech Recognition

arXiv:2204.06328v23.222 citationsh-index: 9

Originality Incremental advance

AI Analysis

This work addresses efficiency issues in automatic speech recognition for applications requiring faster inference, but it is incremental as it builds on existing HuBERT models.

The paper tackles the high computational cost and slow inference speed of self-supervised speech recognition models like HuBERT by introducing HuBERT-EE, an early exit scheme that dynamically stops inference based on confidence, achieving a balance between performance and latency on LibriSpeech.

Pre-training with self-supervised models, such as Hidden-unit BERT (HuBERT) and wav2vec 2.0, has brought significant improvements in automatic speech recognition (ASR). However, these models usually require an expensive computational cost to achieve outstanding performance, slowing down the inference speed. To improve the model efficiency, we introduce an early exit scheme for ASR, namely HuBERT-EE, that allows the model to stop the inference dynamically. In HuBERT-EE, multiple early exit branches are added at the intermediate layers. When the intermediate prediction of the early exit branch is confident, the model stops the inference, and the corresponding result can be returned early. We investigate the proper early exiting criterion and fine-tuning strategy to effectively perform early exiting. Experimental results on the LibriSpeech show that HuBERT-EE can accelerate the inference of the HuBERT while simultaneously balancing the trade-off between the performance and the latency.

View on arXiv PDF

Similar