ASCLSDJan 10, 2020

Improving Dysarthric Speech Intelligibility Using Cycle-consistent Adversarial Training

arXiv:2001.04260v126 citations
AI Analysis

This work addresses communication difficulties for millions of people with dysarthria, representing an incremental improvement using an existing method on new data.

The paper tackled the problem of low intelligibility in dysarthric speech by developing a Cycle-consistent GAN model to convert dysarthric to healthy speech, resulting in a 33.4% absolute reduction in word error rate (WER) on a test set.

Dysarthria is a motor speech impairment affecting millions of people. Dysarthric speech can be far less intelligible than those of non-dysarthric speakers, causing significant communication difficulties. The goal of our work is to develop a model for dysarthric to healthy speech conversion using Cycle-consistent GAN. Using 18,700 dysarthric and 8,610 healthy control Korean utterances that were recorded for the purpose of automatic recognition of voice keyboard in a previous study, the generator is trained to transform dysarthric to healthy speech in the spectral domain, which is then converted back to speech. Objective evaluation using automatic speech recognition of the generated utterance on a held-out test set shows that the recognition performance is improved compared with the original dysarthic speech after performing adversarial training, as the absolute WER has been lowered by 33.4%. It demonstrates that the proposed GAN-based conversion method is useful for improving dysarthric speech intelligibility.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes