AS CLMay 19, 2023

A New Benchmark of Aphasia Speech Recognition and Detection Based on E-Branchformer and Multi-task Learning

Jiyang Tang, William Chen, Xuankai Chang, Shinji Watanabe, Brian MacWhinney

arXiv:2305.13331v14.314 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of processing disordered speech for aphasia patients, providing a standardized benchmark and tools to facilitate research in this domain-specific area.

The paper tackles aphasia speech recognition and detection by introducing a new benchmark using multi-task learning with the CTC/Attention architecture, achieving state-of-the-art speaker-level detection accuracy of 97.3% and an 11% relative WER reduction for moderate aphasia patients.

Aphasia is a language disorder that affects the speaking ability of millions of patients. This paper presents a new benchmark for Aphasia speech recognition and detection tasks using state-of-the-art speech recognition techniques with the AphsiaBank dataset. Specifically, we introduce two multi-task learning methods based on the CTC/Attention architecture to perform both tasks simultaneously. Our system achieves state-of-the-art speaker-level detection accuracy (97.3%), and a relative WER reduction of 11% for moderate Aphasia patients. In addition, we demonstrate the generalizability of our approach by applying it to another disordered speech database, the DementiaBank Pitt corpus. We will make our all-in-one recipes and pre-trained model publicly available to facilitate reproducibility. Our standardized data preprocessing pipeline and open-source recipes enable researchers to compare results directly, promoting progress in disordered speech processing.

View on arXiv PDF Code

Similar