LGOct 30, 2022

Partitioned Gradient Matching-based Data Subset Selection for Compute-Efficient Robust ASR Training

arXiv:2210.16892v1291 citationsh-index: 27
Originality Incremental advance
AI Analysis

This addresses the financial and environmental costs of ASR training for researchers and practitioners, offering a compute-efficient solution with minimal performance loss.

The paper tackled the high computational cost of training state-of-the-art ASR systems like RNN-T by proposing Partitioned Gradient Matching (PGM), a distributable data subset selection algorithm, which achieved 3x to 6x speedup with under 1% absolute WER degradation on Librispeech datasets.

Training state-of-the-art ASR systems such as RNN-T often has a high associated financial and environmental cost. Training with a subset of training data could mitigate this problem if the subset selected could achieve on-par performance with training with the entire dataset. Although there are many data subset selection(DSS) algorithms, direct application to the RNN-T is difficult, especially the DSS algorithms that are adaptive and use learning dynamics such as gradients, as RNN-T tend to have gradients with a significantly larger memory footprint. In this paper, we propose Partitioned Gradient Matching (PGM) a novel distributable DSS algorithm, suitable for massive datasets like those used to train RNN-T. Through extensive experiments on Librispeech 100H and Librispeech 960H, we show that PGM achieves between 3x to 6x speedup with only a very small accuracy degradation (under 1% absolute WER difference). In addition, we demonstrate similar results for PGM even in settings where the training data is corrupted with noise.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes