LGOct 30, 2022

Partitioned Gradient Matching-based Data Subset Selection for Compute-Efficient Robust ASR Training

Ashish Mittal, Durga Sivasubramanian, Rishabh Iyer, Preethi Jyothi, Ganesh Ramakrishnan

arXiv:2210.16892v143.8291 citationsh-index: 27

Originality Incremental advance

AI Analysis

This addresses the financial and environmental costs of ASR training for researchers and practitioners, offering a compute-efficient solution with minimal performance loss.

The paper tackled the high computational cost of training state-of-the-art ASR systems like RNN-T by proposing Partitioned Gradient Matching (PGM), a distributable data subset selection algorithm, which achieved 3x to 6x speedup with under 1% absolute WER degradation on Librispeech datasets.

Training state-of-the-art ASR systems such as RNN-T often has a high associated financial and environmental cost. Training with a subset of training data could mitigate this problem if the subset selected could achieve on-par performance with training with the entire dataset. Although there are many data subset selection(DSS) algorithms, direct application to the RNN-T is difficult, especially the DSS algorithms that are adaptive and use learning dynamics such as gradients, as RNN-T tend to have gradients with a significantly larger memory footprint. In this paper, we propose Partitioned Gradient Matching (PGM) a novel distributable DSS algorithm, suitable for massive datasets like those used to train RNN-T. Through extensive experiments on Librispeech 100H and Librispeech 960H, we show that PGM achieves between 3x to 6x speedup with only a very small accuracy degradation (under 1% absolute WER difference). In addition, we demonstrate similar results for PGM even in settings where the training data is corrupted with noise.

View on arXiv PDF

Similar