SDAIASAug 11, 2025

Bridging ASR and LLMs for Dysarthric Speech Recognition: Benchmarking Self-Supervised and Generative Approaches

arXiv:2508.08027v12 citationsh-index: 6INTERSPEECH
Originality Synthesis-oriented
AI Analysis

This work addresses speech recognition for individuals with dysarthria, an incremental improvement by applying existing methods to a specific domain.

This study tackled the problem of dysarthric speech recognition by benchmarking self-supervised ASR models and introducing LLM-enhanced decoding, finding that LLM-based decoding improves ASR by leveraging linguistic constraints for phoneme restoration and grammatical correction.

Speech Recognition (ASR) due to phoneme distortions and high variability. While self-supervised ASR models like Wav2Vec, HuBERT, and Whisper have shown promise, their effectiveness in dysarthric speech remains unclear. This study systematically benchmarks these models with different decoding strategies, including CTC, seq2seq, and LLM-enhanced decoding (BART,GPT-2, Vicuna). Our contributions include (1) benchmarking ASR architectures for dysarthric speech, (2) introducing LLM-based decoding to improve intelligibility, (3) analyzing generalization across datasets, and (4) providing insights into recognition errors across severity levels. Findings highlight that LLM-enhanced decoding improves dysarthric ASR by leveraging linguistic constraints for phoneme restoration and grammatical correction.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes