SD CL AS QMOct 15, 2021

Towards Identity Preserving Normal to Dysarthric Voice Conversion

Wen-Chin Huang, Bence Mark Halpern, Lester Phillip Violeta, Odette Scharenborg, Tomoki Toda

arXiv:2110.08213v19.829 citations

Originality Synthesis-oriented

AI Analysis

This work addresses a domain-specific problem for clinical decision-making and data augmentation in dysarthric speech recognition, but it is incremental as it builds on existing voice conversion methods with limited improvements in speaker similarity.

The paper tackled the problem of converting normal speech to dysarthric speech while preserving speaker identity, using a two-stage framework; results on the UASpeech dataset showed reasonable naturalness and ability to capture severity aspects, but limited similarity to the source speaker's voice.

We present a voice conversion framework that converts normal speech into dysarthric speech while preserving the speaker identity. Such a framework is essential for (1) clinical decision making processes and alleviation of patient stress, (2) data augmentation for dysarthric speech recognition. This is an especially challenging task since the converted samples should capture the severity of dysarthric speech while being highly natural and possessing the speaker identity of the normal speaker. To this end, we adopted a two-stage framework, which consists of a sequence-to-sequence model and a nonparallel frame-wise model. Objective and subjective evaluations were conducted on the UASpeech dataset, and results showed that the method was able to yield reasonable naturalness and capture severity aspects of the pathological speech. On the other hand, the similarity to the normal source speaker's voice was limited and requires further improvements.

View on arXiv PDF

Similar