Better Intermediates Improve CTC Inference
This work addresses incremental improvements in speech recognition accuracy for users of CTC models.
The paper tackled the problem of improving CTC inference by proposing methods for better conditioning with searched intermediates and multi-pass conditioning, resulting in relative performance improvements of 3% on test clean and 12% on test other sets in LibriSpeech compared to original self-conditioned CTC.
This paper proposes a method for improved CTC inference with searched intermediates and multi-pass conditioning. The paper first formulates self-conditioned CTC as a probabilistic model with an intermediate prediction as a latent representation and provides a tractable conditioning framework. We then propose two new conditioning methods based on the new formulation: (1) Searched intermediate conditioning that refines intermediate predictions with beam-search, (2) Multi-pass conditioning that uses predictions of previous inference for conditioning the next inference. These new approaches enable better conditioning than the original self-conditioned CTC during inference and improve the final performance. Experiments with the LibriSpeech dataset show relative 3%/12% performance improvement at the maximum in test clean/other sets compared to the original self-conditioned CTC.