CLSDASMar 1, 2023

N-best T5: Robust ASR Error Correction using Multiple Input Hypotheses and Constrained Decoding Space

arXiv:2303.00456v352 citationsh-index: 61
Originality Incremental advance
AI Analysis

This work addresses error correction in ASR for better transcription quality, but it is incremental as it builds on existing T5 models and N-best list methods.

The paper tackles the problem of improving ASR error correction by using multiple input hypotheses and constrained decoding, resulting in performance that outperforms a strong baseline.

Error correction models form an important part of Automatic Speech Recognition (ASR) post-processing to improve the readability and quality of transcriptions. Most prior works use the 1-best ASR hypothesis as input and therefore can only perform correction by leveraging the context within one sentence. In this work, we propose a novel N-best T5 model for this task, which is fine-tuned from a T5 model and utilizes ASR N-best lists as model input. By transferring knowledge from the pre-trained language model and obtaining richer information from the ASR decoding space, the proposed approach outperforms a strong Conformer-Transducer baseline. Another issue with standard error correction is that the generation process is not well-guided. To address this a constrained decoding process, either based on the N-best list or an ASR lattice, is used which allows additional information to be propagated.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes