CLAug 18, 2025

Evaluating ASR robustness to spontaneous speech errors: A study of WhisperX using a Speech Error Database

arXiv:2508.13060v11 citationsh-index: 6INTERSPEECH
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of assessing ASR systems for real-world spontaneous speech, though it is incremental as it applies an existing method to a new dataset.

The study tackled the problem of evaluating automatic speech recognition (ASR) robustness to spontaneous speech errors by using the Simon Fraser University Speech Error Database (SFUSED) to test WhisperX on 5,300 documented errors, demonstrating the database's effectiveness as a diagnostic tool.

The Simon Fraser University Speech Error Database (SFUSED) is a public data collection developed for linguistic and psycholinguistic research. Here we demonstrate how its design and annotations can be used to test and evaluate speech recognition models. The database comprises systematically annotated speech errors from spontaneous English speech, with each error tagged for intended and actual error productions. The annotation schema incorporates multiple classificatory dimensions that are of some value to model assessment, including linguistic hierarchical level, contextual sensitivity, degraded words, word corrections, and both word-level and syllable-level error positioning. To assess the value of these classificatory variables, we evaluated the transcription accuracy of WhisperX across 5,300 documented word and phonological errors. This analysis demonstrates the atabase's effectiveness as a diagnostic tool for ASR system performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes