CLSDASMay 22, 2024

You don't understand me!: Comparing ASR results for L1 and L2 speakers of Swedish

arXiv:2405.13379v127 citationsh-index: 25INTERSPEECH
Originality Synthesis-oriented
AI Analysis

This addresses the problem of ASR reliability for non-native speakers and challenging conditions, which is crucial for applications like educational software, but it is incremental as it focuses on comparing existing services.

The study tackled the performance gap in Automatic Speech Recognition (ASR) systems between native and non-native speakers of Swedish, finding that recognition results vary significantly for read and spontaneous utterances, with analysis of linguistic factors contributing to transcription errors.

The performance of Automatic Speech Recognition (ASR) systems has constantly increased in state-of-the-art development. However, performance tends to decrease considerably in more challenging conditions (e.g., background noise, multiple speaker social conversations) and with more atypical speakers (e.g., children, non-native speakers or people with speech disorders), which signifies that general improvements do not necessarily transfer to applications that rely on ASR, e.g., educational software for younger students or language learners. In this study, we focus on the gap in performance between recognition results for native and non-native, read and spontaneous, Swedish utterances transcribed by different ASR services. We compare the recognition results using Word Error Rate and analyze the linguistic factors that may generate the observed transcription errors.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes