SDAIASJun 19, 2025

Improved Intelligibility of Dysarthric Speech using Conditional Flow Matching

arXiv:2506.16127v12 citationsh-index: 2INTERSPEECH
Originality Incremental advance
AI Analysis

This work addresses the communication challenges faced by individuals with dysarthria, offering an incremental improvement in speech conversion techniques.

The paper tackled the problem of converting dysarthric speech to regular speech to improve intelligibility, proposing a non-autoregressive method using Conditional Flow Matching with Diffusion Transformers and discrete acoustic units, which achieved faster convergence and enhanced intelligibility compared to mel-spectrogram-based approaches.

Dysarthria is a neurological disorder that significantly impairs speech intelligibility, often rendering affected individuals unable to communicate effectively. This necessitates the development of robust dysarthric-to-regular speech conversion techniques. In this work, we investigate the utility and limitations of self-supervised learning (SSL) features and their quantized representations as an alternative to mel-spectrograms for speech generation. Additionally, we explore methods to mitigate speaker variability by generating clean speech in a single-speaker voice using features extracted from WavLM. To this end, we propose a fully non-autoregressive approach that leverages Conditional Flow Matching (CFM) with Diffusion Transformers to learn a direct mapping from dysarthric to clean speech. Our findings highlight the effectiveness of discrete acoustic units in improving intelligibility while achieving faster convergence compared to traditional mel-spectrogram-based approaches.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes