ASAILGSDMay 25, 2025

WhisperD: Dementia Speech Recognition and Filler Word Detection with Whisper

arXiv:2505.21551v11 citationsh-index: 1Has CodeINTERSPEECH
Originality Incremental advance
AI Analysis

This work addresses the need for accurate transcription in dementia speech to support cost-effective diagnosis and assistive technology development, but it is incremental as it builds on existing Whisper models.

The paper tackled the problem of Whisper's poor transcription of dementia speech due to irregular patterns and disfluencies, by fine-tuning it on dementia datasets, resulting in a word error rate of 0.24 and improved filler word detection.

Whisper fails to correctly transcribe dementia speech because persons with dementia (PwDs) often exhibit irregular speech patterns and disfluencies such as pauses, repetitions, and fragmented sentences. It was trained on standard speech and may have had little or no exposure to dementia-affected speech. However, correct transcription is vital for dementia speech for cost-effective diagnosis and the development of assistive technology. In this work, we fine-tune Whisper with the open-source dementia speech dataset (DementiaBank) and our in-house dataset to improve its word error rate (WER). The fine-tuning also includes filler words to ascertain the filler inclusion rate (FIR) and F1 score. The fine-tuned models significantly outperformed the off-the-shelf models. The medium-sized model achieved a WER of 0.24, outperforming previous work. Similarly, there was a notable generalisability to unseen data and speech patterns.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes