CLASAug 20, 2024

Disentangling segmental and prosodic factors to non-native speech comprehensibility

arXiv:2408.10997v13 citationsh-index: 2
Originality Incremental advance
AI Analysis

This work addresses the challenge of improving speech comprehensibility for non-native speakers, offering a tool to study social attitudes, though it is incremental in advancing accent conversion systems.

The researchers tackled the problem of disentangling segmental and prosodic factors in non-native speech to assess their impact on comprehensibility, finding that segmental features have a larger effect than prosody, with perceptual listening tests providing quantitative evidence.

Current accent conversion (AC) systems do not disentangle the two main sources of non-native accent: segmental and prosodic characteristics. Being able to manipulate a non-native speaker's segmental and/or prosodic channels independently is critical to quantify how these two channels contribute to speech comprehensibility and social attitudes. We present an AC system that not only decouples voice quality from accent, but also disentangles the latter into its segmental and prosodic characteristics. The system is able to generate accent conversions that combine (1) the segmental characteristics from a source utterance, (2) the voice characteristics from a target utterance, and (3) the prosody of a reference utterance. We show that vector quantization of acoustic embeddings and removal of consecutive duplicated codewords allows the system to transfer prosody and improve voice similarity. We conduct perceptual listening tests to quantify the individual contributions of segmental features and prosody on the perceived comprehensibility of non-native speech. Our results indicate that, contrary to prior research in non-native speech, segmental features have a larger impact on comprehensibility than prosody. The proposed AC system may also be used to study how segmental and prosody cues affect social attitudes towards non-native speech.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes