AS AISep 12, 2020

Visual-speech Synthesis of Exaggerated Corrective Feedback

Yaohua Bu, Weijun Li, Tianyi Ma, Shengqi Chen, Jia Jia, Kun Li, Xiaobo Lu

arXiv:2009.05748v22.31 citations

Originality Incremental advance

AI Analysis

This addresses pronunciation training for second language learners, but it is incremental as it builds on existing methods like Tacotron and ADC Viseme Blending.

The paper tackled the problem of providing discriminative feedback for second language learners to identify mispronunciation by proposing a method for exaggerated visual-speech feedback in computer-assisted pronunciation training, with user studies showing it outperforms non-exaggerated versions in helping learners with pronunciation identification and improvement.

To provide more discriminative feedback for the second language (L2) learners to better identify their mispronunciation, we propose a method for exaggerated visual-speech feedback in computer-assisted pronunciation training (CAPT). The speech exaggeration is realized by an emphatic speech generation neural network based on Tacotron, while the visual exaggeration is accomplished by ADC Viseme Blending, namely increasing Amplitude of movement, extending the phone's Duration and enhancing the color Contrast. User studies show that exaggerated feedback outperforms non-exaggerated version on helping learners with pronunciation identification and pronunciation improvement.

View on arXiv PDF

Similar