CLSDASJun 1, 2025

Mispronunciation Detection Without L2 Pronunciation Dataset in Low-Resource Setting: A Case Study in Finland Swedish

arXiv:2506.01156v11 citationsh-index: 8Has CodeINTERSPEECH
Originality Incremental advance
AI Analysis

This provides a tool for language learners of low-resource varieties like Finland Swedish, but it is incremental as it adapts existing methods with minimal L2 data.

The paper tackled mispronunciation detection for Finland Swedish, a low-resource language, by training a model on 89 hours of L1 speech and testing on 33 minutes of L2 speech, achieving a balance of 43.2% Recall and 29.8% Precision compared to a baseline with 77.5% Recall and 17.6% Precision.

Mispronunciation detection (MD) models are the cornerstones of many language learning applications. Unfortunately, most systems are built for English and other major languages, while low-resourced language varieties, such as Finland Swedish (FS), lack such tools. In this paper, we introduce our MD model for FS, trained on 89 hours of first language (L1) speakers' spontaneous speech and tested on 33 minutes of L2 transcribed read-aloud speech. We trained a multilingual wav2vec 2.0 model with entropy regularization, followed by temperature scaling and top-k normalization after the inference to better adapt it for MD. The main novelty of our method lies in its simplicity, requiring minimal L2 data. The process is also language-independent, making it suitable for other low-resource languages. Our proposed algorithm allows us to balance Recall (43.2%) and Precision (29.8%), compared with the baseline model's Recall (77.5%) and Precision (17.6%).

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes