CL SD ASSep 24, 2024

Spelling Correction through Rewriting of Non-Autoregressive ASR Lattices

Leonid Velikovich, Christopher Li, Diamantino Caseiro, Shankar Kumar, Pat Rondon, Kandarp Joshi, Xavier Velez

arXiv:2409.16469v11.92 citationsh-index: 17

Originality Incremental advance

AI Analysis

This addresses the challenge of recognizing personal or rare phrases in ASR systems, offering a practical improvement for speech recognition applications.

The paper tackled the problem of improving spelling correction for non-autoregressive ASR models by developing an FST-based technique that rewrites wordpiece lattices without retraining, achieving up to a 15.2% relative reduction in sentence error rate on a test set with contextually relevant entities.

For end-to-end Automatic Speech Recognition (ASR) models, recognizing personal or rare phrases can be hard. A promising way to improve accuracy is through spelling correction (or rewriting) of the ASR lattice, where potentially misrecognized phrases are replaced with acoustically similar and contextually relevant alternatives. However, rewriting is challenging for ASR models trained with connectionist temporal classification (CTC) due to noisy hypotheses produced by a non-autoregressive, context-independent beam search. We present a finite-state transducer (FST) technique for rewriting wordpiece lattices generated by Transformer-based CTC models. Our algorithm performs grapheme-to-phoneme (G2P) conversion directly from wordpieces into phonemes, avoiding explicit word representations and exploiting the richness of the CTC lattice. Our approach requires no retraining or modification of the ASR model. We achieved up to a 15.2% relative reduction in sentence error rate (SER) on a test set with contextually relevant entities.

View on arXiv PDF

Similar