Mispronunciation Detection in Non-native (L2) English with Uncertainty Modeling
This addresses a key issue in language learning tools for non-native English speakers, though it is an incremental improvement over existing methods.
The paper tackled the problem of false alarms in automatic mispronunciation detection for language learners by modeling uncertainty in phoneme recognition and allowing multiple valid pronunciations, resulting in up to 18% relative improvement in precision.
A common approach to the automatic detection of mispronunciation in language learning is to recognize the phonemes produced by a student and compare it to the expected pronunciation of a native speaker. This approach makes two simplifying assumptions: a) phonemes can be recognized from speech with high accuracy, b) there is a single correct way for a sentence to be pronounced. These assumptions do not always hold, which can result in a significant amount of false mispronunciation alarms. We propose a novel approach to overcome this problem based on two principles: a) taking into account uncertainty in the automatic phoneme recognition step, b) accounting for the fact that there may be multiple valid pronunciations. We evaluate the model on non-native (L2) English speech of German, Italian and Polish speakers, where it is shown to increase the precision of detecting mispronunciations by up to 18% (relative) compared to the common approach.