CL LG SD ASJun 26, 2024

Token-Weighted RNN-T for Learning from Flawed Data

arXiv:2406.18108v11.91 citations

Originality Incremental advance

AI Analysis

This addresses accuracy degradation in ASR models due to flawed training data, which is an incremental improvement for speech recognition applications.

The paper tackles the problem of training ASR models with flawed data, such as transcription errors in pseudo-labels or human annotations, by proposing a token-weighted RNN-T criterion to de-emphasize erroneous tokens. Results show up to 38% relative accuracy improvement in semi-supervised learning and recovery of 64%-99% of accuracy loss from transcription errors.

ASR models are commonly trained with the cross-entropy criterion to increase the probability of a target token sequence. While optimizing the probability of all tokens in the target sequence is sensible, one may want to de-emphasize tokens that reflect transcription errors. In this work, we propose a novel token-weighted RNN-T criterion that augments the RNN-T objective with token-specific weights. The new objective is used for mitigating accuracy loss from transcriptions errors in the training data, which naturally appear in two settings: pseudo-labeling and human annotation errors. Experiments results show that using our method for semi-supervised learning with pseudo-labels leads to a consistent accuracy improvement, up to 38% relative. We also analyze the accuracy degradation resulting from different levels of WER in the reference transcription, and show that token-weighted RNN-T is suitable for overcoming this degradation, recovering 64%-99% of the accuracy loss.

View on arXiv PDF

Similar