Careful Whisper -- leveraging advances in automatic speech recognition for robust and interpretable aphasia subtype classification
This provides a robust and interpretable tool for diagnosing aphasia subtypes, with potential applicability to other diseases and languages, though it is incremental in combining existing methods.
The paper tackled the problem of automatically identifying speech anomalies from voice recordings to assess speech impairments, achieving human-level accuracy in distinguishing aphasia from healthy controls and 90% accuracy in classifying aphasia subtypes.
This paper presents a fully automated approach for identifying speech anomalies from voice recordings to aid in the assessment of speech impairments. By combining Connectionist Temporal Classification (CTC) and encoder-decoder-based automatic speech recognition models, we generate rich acoustic and clean transcripts. We then apply several natural language processing methods to extract features from these transcripts to produce prototypes of healthy speech. Basic distance measures from these prototypes serve as input features for standard machine learning classifiers, yielding human-level accuracy for the distinction between recordings of people with aphasia and a healthy control group. Furthermore, the most frequently occurring aphasia types can be distinguished with 90% accuracy. The pipeline is directly applicable to other diseases and languages, showing promise for robustly extracting diagnostic speech biomarkers.