Confidence-Guided Error Correction for Disordered Speech Recognition
This addresses the problem of improving ASR accuracy for impaired speech, which is an incremental advancement in domain-specific applications.
The paper tackled error correction in disordered speech recognition by using confidence-informed prompting with LLMs, achieving a 10% relative WER reduction on the Speech Accessibility Project and 47% on TORGO.
We investigate the use of large language models (LLMs) as post-processing modules for automatic speech recognition (ASR), focusing on their ability to perform error correction for disordered speech. In particular, we propose confidence-informed prompting, where word-level uncertainty estimates are embedded directly into LLM training to improve robustness and generalization across speakers and datasets. This approach directs the model to uncertain ASR regions and reduces overcorrection. We fine-tune a LLaMA 3.1 model and compare our approach to both transcript-only fine-tuning and post hoc confidence-based filtering. Evaluations show that our method achieves a 10% relative WER reduction compared to naive LLM correction on the Speech Accessibility Project spontaneous speech and a 47% reduction on TORGO, demonstrating the effectiveness of confidence-aware fine-tuning for impaired speech.