SDAIASJan 13, 2022

The Effectiveness of Time Stretching for Enhancing Dysarthric Speech for Improved Dysarthric Speech Recognition

arXiv:2201.04908v121 citations
AI Analysis

This work addresses the challenge of enhancing dysarthric speech for better recognition, which is important for individuals with speech impairments, but it is incremental as it builds on existing methods with modest improvements.

The paper tackled the problem of improving dysarthric speech recognition by comparing various enhancement methods, finding that simple signal processing techniques like noise removal and time stretching achieve results comparable to state-of-the-art GAN-based voice conversion methods in phoneme recognition tasks, and that a proposed combination of MaskCycleGAN-VC with time stretching further improves results for some speakers.

In this paper, we investigate several existing and a new state-of-the-art generative adversarial network-based (GAN) voice conversion method for enhancing dysarthric speech for improved dysarthric speech recognition. We compare key components of existing methods as part of a rigorous ablation study to find the most effective solution to improve dysarthric speech recognition. We find that straightforward signal processing methods such as stationary noise removal and vocoder-based time stretching lead to dysarthric speech recognition results comparable to those obtained when using state-of-the-art GAN-based voice conversion methods as measured using a phoneme recognition task. Additionally, our proposed solution of a combination of MaskCycleGAN-VC and time stretched enhancement is able to improve the phoneme recognition results for certain dysarthric speakers compared to our time stretched baseline.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes