Analyzing Error Propagation in Korean Spoken QA with ASR-LLM Cascades

arXiv:2605.1744347.5

AI Analysis

For researchers building spoken QA systems for Korean, this work identifies specific failure modes of ASR-LLM cascades and suggests direct audio input as a mitigation strategy.

The paper analyzes how ASR errors propagate through ASR-LLM cascades in Korean spoken QA, finding that downstream degradation correlates with ASR information loss and that single-character errors can cause complete semantic failure. A large audio language model outperforms the cascade in noisy conditions.

We analyze how automatic speech recognition (ASR) errors propagate through ASR-LLM cascades in Korean spoken question answering (SQA), focusing on downstream semantic failures that conventional ASR metrics cannot fully capture. Our analysis shows that the relative downstream degradation caused by ASR errors is consistent across LLMs with different absolute performance, suggesting that cascade degradation largely tracks ASR-stage information loss. We further identify single-character Korean ASR errors as a distinct semantic-failure channel, where the gold answer becomes entirely absent from the downstream prediction despite only a minimal transcription difference. Finally, an auxiliary comparison shows that a large audio language model outperforms an ASR-LLM pipeline with a matched language backbone in noisy Korean SQA, indicating the potential of direct audio input to mitigate transcript-induced information loss.

View on arXiv PDF

Similar