CL SD ASFeb 1, 2025

Data-Driven Mispronunciation Pattern Discovery for Robust Speech Recognition

Anna Seo Gyeong Choi, Jonghyeon Park, Myungwoo Oh

arXiv:2502.00583v16.73 citationsh-index: 1ICASSP

Originality Incremental advance

AI Analysis

This provides practical advancements for robust Automatic Speech Recognition systems, particularly for non-native speakers, but is incremental as it builds on existing methods.

The paper tackled the problem of recognizing speech from non-fluent or accented speakers by proposing data-driven approaches to detect mispronunciation patterns, resulting in a 5.7% improvement in speech recognition on native English datasets and a 12.8% improvement for non-native English speakers.

Recent advancements in machine learning have significantly improved speech recognition, but recognizing speech from non-fluent or accented speakers remains a challenge. Previous efforts, relying on rule-based pronunciation patterns, have struggled to fully capture non-native errors. We propose two data-driven approaches using speech corpora to automatically detect mispronunciation patterns. By aligning non-native phones with their native counterparts using attention maps, we achieved a 5.7% improvement in speech recognition on native English datasets and a 12.8% improvement for non-native English speakers, particularly Korean speakers. Our method offers practical advancements for robust Automatic Speech Recognition (ASR) systems particularly for situations where prior linguistic knowledge is not applicable.

View on arXiv PDF

Similar