LGMay 29, 2025

Prompting Whisper for Improved Verbatim Transcription and End-to-end Miscue Detection

arXiv:2505.23627v1h-index: 7INTERSPEECH
Originality Incremental advance
AI Analysis

This addresses the challenge of identifying reading mistakes (miscues) more accurately for applications like educational assessment or speech therapy, though it is incremental as it builds on existing ASR and prompting techniques.

The paper tackled the problem of inaccurate verbatim transcription in automatic speech recognition (ASR) for reading error detection by proposing an end-to-end architecture that uses prompting to incorporate target reading text, resulting in improved verbatim transcription and miscue detection compared to state-of-the-art methods in case studies on children's read-aloud and adult atypical speech.

Identifying mistakes (i.e., miscues) made while reading aloud is commonly approached post-hoc by comparing automatic speech recognition (ASR) transcriptions to the target reading text. However, post-hoc methods perform poorly when ASR inaccurately transcribes verbatim speech. To improve on current methods for reading error annotation, we propose a novel end-to-end architecture that incorporates the target reading text via prompting and is trained for both improved verbatim transcription and direct miscue detection. Our contributions include: first, demonstrating that incorporating reading text through prompting benefits verbatim transcription performance over fine-tuning, and second, showing that it is feasible to augment speech recognition tasks for end-to-end miscue detection. We conducted two case studies -- children's read-aloud and adult atypical speech -- and found that our proposed strategies improve verbatim transcription and miscue detection compared to current state-of-the-art.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes