Gesture-Aware Zero-Shot Speech Recognition for Patients with Language Disorders
This addresses communication challenges for individuals with language disorders by developing more effective assistive technologies, though it is incremental as it builds on existing multimodal and zero-shot learning approaches.
The paper tackled the problem of improving speech recognition for patients with language disorders by integrating gesture information, resulting in significantly enhanced semantic understanding through a gesture-aware ASR system.
Individuals with language disorders often face significant communication challenges due to their limited language processing and comprehension abilities, which also affect their interactions with voice-assisted systems that mostly rely on Automatic Speech Recognition (ASR). Despite advancements in ASR that address disfluencies, there has been little attention on integrating non-verbal communication methods, such as gestures, which individuals with language disorders substantially rely on to supplement their communication. Recognizing the need to interpret the latent meanings of visual information not captured by speech alone, we propose a gesture-aware ASR system utilizing a multimodal large language model with zero-shot learning for individuals with speech impairments. Our experiment results and analyses show that including gesture information significantly enhances semantic understanding. This study can help develop effective communication technologies, specifically designed to meet the unique needs of individuals with language impairments.