CLSDASNov 1, 2021

A transfer learning based approach for pronunciation scoring

arXiv:2111.00976v214 citations
AI Analysis

This work addresses the problem of accurate pronunciation scoring for language learners, but it is incremental as it builds on existing ASR and transfer learning methods.

The paper tackles the challenge of phone-level pronunciation scoring by proposing a transfer learning approach that adapts an ASR model to this task, achieving a 20% improvement over a state-of-the-art GOP system on the EpaDB database.

Phone-level pronunciation scoring is a challenging task, with performance far from that of human annotators. Standard systems generate a score for each phone in a phrase using models trained for automatic speech recognition (ASR) with native data only. Better performance has been shown when using systems that are trained specifically for the task using non-native data. Yet, such systems face the challenge that datasets labelled for this task are scarce and usually small. In this paper, we present a transfer learning-based approach that leverages a model trained for ASR, adapting it for the task of pronunciation scoring. We analyze the effect of several design choices and compare the performance with a state-of-the-art goodness of pronunciation (GOP) system. Our final system is 20% better than the GOP system on EpaDB, a database for pronunciation scoring research, for a cost function that prioritizes low rates of unnecessary corrections.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes