ASAISDJun 2, 2025

Evaluating Logit-Based GOP Scores for Mispronunciation Detection

arXiv:2506.12067v2h-index: 40INTERSPEECH
Originality Incremental advance
AI Analysis

This work addresses pronunciation assessment for L2 English learners, offering incremental improvements by proposing hybrid GOP methods.

The study compared logit-based and probability-based goodness of pronunciation (GOP) scores for mispronunciation detection, finding that logit-based methods outperformed in classification, with maximum logit GOP showing the strongest alignment with human perception.

Pronunciation assessment relies on goodness of pronunciation (GOP) scores, traditionally derived from softmax-based posterior probabilities. However, posterior probabilities may suffer from overconfidence and poor phoneme separation, limiting their effectiveness. This study compares logit-based GOP scores with probability-based GOP scores for mispronunciation detection. We conducted our experiment on two L2 English speech datasets spoken by Dutch and Mandarin speakers, assessing classification performance and correlation with human ratings. Logit-based methods outperform probability-based GOP in classification, but their effectiveness depends on dataset characteristics. The maximum logit GOP shows the strongest alignment with human perception, while a combination of different GOP scores balances probability and logit features. The findings suggest that hybrid GOP methods incorporating uncertainty modeling and phoneme-specific weighting improve pronunciation assessment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes