AIJul 29, 2025

The Interspeech 2025 Speech Accessibility Project Challenge

Xiuwen Zheng, Bornali Phukon, Jonghwan Na, Ed Cutrell, Kyu Han, Mark Hasegawa-Johnson, Pan-Pan Jiang, Aadhrik Kuila, Colin Lea, Bob MacDonald, Gautam Mantena, Venkatesh Ravichandran

arXiv:2507.22047v18 citationsh-index: 44INTERSPEECH

Originality Synthesis-oriented

AI Analysis

This addresses the problem of poor ASR accuracy for people with speech disabilities, but it is incremental as it focuses on a specific challenge and dataset.

The Interspeech 2025 Speech Accessibility Project Challenge tackled the problem of inadequate ASR performance for individuals with speech disabilities by using over 400 hours of data from 500+ people, resulting in 12 out of 22 teams outperforming a baseline in WER and 17 in SemScore, with the top team achieving a WER of 8.11% and SemScore of 88.44%.

While the last decade has witnessed significant advancements in Automatic Speech Recognition (ASR) systems, performance of these systems for individuals with speech disabilities remains inadequate, partly due to limited public training data. To bridge this gap, the 2025 Interspeech Speech Accessibility Project (SAP) Challenge was launched, utilizing over 400 hours of SAP data collected and transcribed from more than 500 individuals with diverse speech disabilities. Hosted on EvalAI and leveraging the remote evaluation pipeline, the SAP Challenge evaluates submissions based on Word Error Rate and Semantic Score. Consequently, 12 out of 22 valid teams outperformed the whisper-large-v2 baseline in terms of WER, while 17 teams surpassed the baseline on SemScore. Notably, the top team achieved the lowest WER of 8.11\%, and the highest SemScore of 88.44\% at the same time, setting new benchmarks for future ASR systems in recognizing impaired speech.

View on arXiv PDF

Similar