CLNov 11, 2020

Text Augmentation for Language Models in High Error Recognition Scenario

arXiv:2011.06056v13 citations
AI Analysis

This work addresses speech recognition accuracy in noisy environments, but it is incremental as it builds on existing augmentation techniques with a specific focus on error statistics.

The paper tackled improving language models for speech recognition by comparing data augmentation methods, finding that a simple scheme based on global error statistics outperforms other approaches, increasing absolute WER improvement from 1.1% to 1.9% on the CHiMe-6 challenge.

We examine the effect of data augmentation for training of language models for speech recognition. We compare augmentation based on global error statistics with one based on per-word unigram statistics of ASR errors and observe that it is better to only pay attention the global substitution, deletion and insertion rates. This simple scheme also performs consistently better than label smoothing and its sampled variants. Additionally, we investigate into the behavior of perplexity estimated on augmented data, but conclude that it gives no better prediction of the final error rate. Our best augmentation scheme increases the absolute WER improvement from second-pass rescoring from 1.1 % to 1.9 % absolute on the CHiMe-6 challenge.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes