CL SD ASMay 31, 2025

Causal Structure Discovery for Error Diagnostics of Children's ASR

Vishwanath Pratap Singh, Md. Sahidullah, Tomi Kinnunen

arXiv:2506.00402v12.7h-index: 35INTERSPEECH

Originality Incremental advance

AI Analysis

This addresses error diagnostics for children's ASR, which underperforms compared to adults, by providing a novel causal analysis that identifies and quantifies key factors, though it is incremental in applying causal methods to this domain.

The paper tackled the problem of diagnosing errors in children's automatic speech recognition (ASR) by developing a causal structure discovery method to analyze interdependent factors like physiology, cognition, and extrinsic elements, and used causal quantification to measure their impacts, with experiments on Whisper and Wav2Vec2.0 showing generalizable findings.

Children's automatic speech recognition (ASR) often underperforms compared to that of adults due to a confluence of interdependent factors: physiological (e.g., smaller vocal tracts), cognitive (e.g., underdeveloped pronunciation), and extrinsic (e.g., vocabulary limitations, background noise). Existing analysis methods examine the impact of these factors in isolation, neglecting interdependencies-such as age affecting ASR accuracy both directly and indirectly via pronunciation skills. In this paper, we introduce a causal structure discovery to unravel these interdependent relationships among physiology, cognition, extrinsic factors, and ASR errors. Then, we employ causal quantification to measure each factor's impact on children's ASR. We extend the analysis to fine-tuned models to identify which factors are mitigated by fine-tuning and which remain largely unaffected. Experiments on Whisper and Wav2Vec2.0 demonstrate the generalizability of our findings across different ASR systems.

View on arXiv PDF

Similar