ASCLMay 19, 2023

A New Benchmark of Aphasia Speech Recognition and Detection Based on E-Branchformer and Multi-task Learning

arXiv:2305.13331v114 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of processing disordered speech for aphasia patients, providing a standardized benchmark and tools to facilitate research in this domain-specific area.

The paper tackles aphasia speech recognition and detection by introducing a new benchmark using multi-task learning with the CTC/Attention architecture, achieving state-of-the-art speaker-level detection accuracy of 97.3% and an 11% relative WER reduction for moderate aphasia patients.

Aphasia is a language disorder that affects the speaking ability of millions of patients. This paper presents a new benchmark for Aphasia speech recognition and detection tasks using state-of-the-art speech recognition techniques with the AphsiaBank dataset. Specifically, we introduce two multi-task learning methods based on the CTC/Attention architecture to perform both tasks simultaneously. Our system achieves state-of-the-art speaker-level detection accuracy (97.3%), and a relative WER reduction of 11% for moderate Aphasia patients. In addition, we demonstrate the generalizability of our approach by applying it to another disordered speech database, the DementiaBank Pitt corpus. We will make our all-in-one recipes and pre-trained model publicly available to facilitate reproducibility. Our standardized data preprocessing pipeline and open-source recipes enable researchers to compare results directly, promoting progress in disordered speech processing.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes