CLSDASFeb 1, 2025

Data-Driven Mispronunciation Pattern Discovery for Robust Speech Recognition

arXiv:2502.00583v13 citationsh-index: 1ICASSP
Originality Incremental advance
AI Analysis

This provides practical advancements for robust Automatic Speech Recognition systems, particularly for non-native speakers, but is incremental as it builds on existing methods.

The paper tackled the problem of recognizing speech from non-fluent or accented speakers by proposing data-driven approaches to detect mispronunciation patterns, resulting in a 5.7% improvement in speech recognition on native English datasets and a 12.8% improvement for non-native English speakers.

Recent advancements in machine learning have significantly improved speech recognition, but recognizing speech from non-fluent or accented speakers remains a challenge. Previous efforts, relying on rule-based pronunciation patterns, have struggled to fully capture non-native errors. We propose two data-driven approaches using speech corpora to automatically detect mispronunciation patterns. By aligning non-native phones with their native counterparts using attention maps, we achieved a 5.7% improvement in speech recognition on native English datasets and a 12.8% improvement for non-native English speakers, particularly Korean speakers. Our method offers practical advancements for robust Automatic Speech Recognition (ASR) systems particularly for situations where prior linguistic knowledge is not applicable.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes