CLJun 27, 2022

Wav2Vec-Aug: Improved self-supervised training with limited data

BaiduCMUMeta AI
arXiv:2206.13654v116 citationsh-index: 52
Originality Incremental advance
AI Analysis

This addresses the challenge of limited unlabeled data for many languages in speech SSL, though it appears incremental as it builds on Wav2Vec 2.0.

The paper tackles the problem of applying self-supervised learning to speech domains with limited data by using data augmentation for Wav2Vec 2.0 pretraining, achieving up to a 13% relative improvement in word error rate on Librispeech benchmarks.

Self-supervised learning (SSL) of speech representations has received much attention over the last few years but most work has focused on languages and domains with an abundance of unlabeled data. However, for many languages there is a shortage even in the unlabeled data which limits the effectiveness of SSL. In this work, we focus on the problem of applying SSL to domains with limited available data by leveraging data augmentation for Wav2Vec 2.0 pretraining. Further, we propose improvements to each component of the model which result in a combined relative word error rate (WER) improvement of up to 13% compared to Wav2Vec 2.0 on Librispeech test-clean / other.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes