The complementary roles of non-verbal cues for Robust Pronunciation Assessment
This work addresses pronunciation assessment for non-native speakers, offering an incremental improvement by integrating previously neglected non-verbal cues.
The study tackled the problem of pronunciation assessment by incorporating non-verbal cues alongside conventional speech features, resulting in a framework that matched or outperformed existing methods.
Research on pronunciation assessment systems focuses on utilizing phonetic and phonological aspects of non-native (L2) speech, often neglecting the rich layer of information hidden within the non-verbal cues. In this study, we proposed a novel pronunciation assessment framework, IntraVerbalPA. % The framework innovatively incorporates both fine-grained frame- and abstract utterance-level non-verbal cues, alongside the conventional speech and phoneme representations. Additionally, we introduce ''Goodness of phonemic-duration'' metric to effectively model duration distribution within the framework. Our results validate the effectiveness of the proposed IntraVerbalPA framework and its individual components, yielding performance that either matches or outperforms existing research works.