AS AI CL LG SDApr 9, 2025

RNN-Transducer-based Losses for Speech Recognition on Noisy Targets

arXiv:2504.06963v12.32 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses a challenge in industrial pipelines where large datasets often contain transcription errors, offering incremental improvements to mitigate these issues.

The paper tackles the problem of training speech recognition systems on noisy transcripts by introducing novel loss functions for RNN-Transducer models, with results showing that the Target-Robust Transducer loss restores over 70% of quality compared to models trained on accurate transcripts.

Training speech recognition systems on noisy transcripts is a significant challenge in industrial pipelines, where datasets are enormous and ensuring accurate transcription for every instance is difficult. In this work, we introduce novel loss functions to mitigate the impact of transcription errors in RNN-Transducer models. Our Star-Transducer loss addresses deletion errors by incorporating "skip frame" transitions in the loss lattice, restoring over 90% of the system's performance compared to models trained with accurate transcripts. The Bypass-Transducer loss uses "skip token" transitions to tackle insertion errors, recovering more than 60% of the quality. Finally, the Target-Robust Transducer loss merges these approaches, offering robust performance against arbitrary errors. Experimental results demonstrate that the Target-Robust Transducer loss significantly improves RNN-T performance on noisy data by restoring over 70% of the quality compared to well-transcribed data.

View on arXiv PDF Code

Similar