CLLGMLDec 10, 2016

Active Learning for Speech Recognition: the Power of Gradients

arXiv:1612.03226v168 citations
Originality Incremental advance
AI Analysis

This work addresses the high cost of labeling audio data for speech recognition systems, offering a practical improvement for developers and researchers in the field, though it is incremental as it builds on existing active learning methods.

The paper tackled the problem of reducing labeling costs in speech recognition by investigating gradient-based active learning, specifically Expected Gradient Length (EGL), and showed that it reduces word errors by 11% or cuts labeling samples by 50% compared to random sampling.

In training speech recognition systems, labeling audio clips can be expensive, and not all data is equally valuable. Active learning aims to label only the most informative samples to reduce cost. For speech recognition, confidence scores and other likelihood-based active learning methods have been shown to be effective. Gradient-based active learning methods, however, are still not well-understood. This work investigates the Expected Gradient Length (EGL) approach in active learning for end-to-end speech recognition. We justify EGL from a variance reduction perspective, and observe that EGL's measure of informativeness picks novel samples uncorrelated with confidence scores. Experimentally, we show that EGL can reduce word errors by 11\%, or alternatively, reduce the number of samples to label by 50\%, when compared to random sampling.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes