AS LG SDOct 20, 2020

Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition

Yu Zhang, James Qin, Daniel S. Park, Wei Han, Chung-Cheng Chiu, Ruoming Pang, Quoc V. Le, Yonghui Wu

arXiv:2010.10504v235.5334 citations

Originality Incremental advance

AI Analysis

This work advances speech recognition accuracy for applications like transcription, though it is incremental as it builds on existing methods.

The paper tackled improving automatic speech recognition by combining semi-supervised learning techniques, achieving word-error-rates of 1.4%/2.6% on LibriSpeech test sets, which beat the previous state-of-the-art of 1.7%/3.3%.

We employ a combination of recent developments in semi-supervised learning for automatic speech recognition to obtain state-of-the-art results on LibriSpeech utilizing the unlabeled audio of the Libri-Light dataset. More precisely, we carry out noisy student training with SpecAugment using giant Conformer models pre-trained using wav2vec 2.0 pre-training. By doing so, we are able to achieve word-error-rates (WERs) 1.4%/2.6% on the LibriSpeech test/test-other sets against the current state-of-the-art WERs 1.7%/3.3%.

View on arXiv PDF

Similar