CLAIASJun 24, 2025

JCAPT: A Joint Modeling Approach for CAPT

arXiv:2506.19315v2h-index: 1Slate
Originality Incremental advance
AI Analysis

This work addresses pronunciation feedback for second language learners, presenting an incremental improvement through joint modeling.

The paper tackled the problem of improving computer-assisted pronunciation training by jointly modeling automatic pronunciation assessment and mispronunciation detection and diagnosis, resulting in a model that consistently outperformed prior methods on the speechocean762 benchmark, especially for mispronunciation detection.

Effective pronunciation feedback is critical in second language (L2) learning, for which computer-assisted pronunciation training (CAPT) systems often encompass two key tasks: automatic pronunciation assessment (APA) and mispronunciation detection and diagnosis (MDD). Recent work has shown that joint modeling of these two tasks can yield mutual benefits. Our unified framework leverages Mamba, a selective state space model (SSM), while integrating phonological features and think token strategies to jointly enhance interpretability and fine-grained temporal reasoning in APA and MDD. To our knowledge, this is the first study to combine phonological attribution, SSM-based modeling, and prompting in CAPT. A series of experiments conducted on the speechocean762 benchmark demonstrate that our model consistently outperforms prior methods, particularly on the MDD task.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes