Performant ASR Models for Medical Entities in Accented Speech
This addresses a critical safety issue for healthcare applications where ASR errors in medical terms could harm patients, though it is incremental as it builds on existing fine-tuning methods.
The paper tackled the problem of automatic speech recognition (ASR) models performing poorly on medical named entities in accented speech, finding that fine-tuning on accented clinical data improved medical word error rates by 25-34% relative.
Recent strides in automatic speech recognition (ASR) have accelerated their application in the medical domain where their performance on accented medical named entities (NE) such as drug names, diagnoses, and lab results, is largely unknown. We rigorously evaluate multiple ASR models on a clinical English dataset of 93 African accents. Our analysis reveals that despite some models achieving low overall word error rates (WER), errors in clinical entities are higher, potentially posing substantial risks to patient safety. To empirically demonstrate this, we extract clinical entities from transcripts, develop a novel algorithm to align ASR predictions with these entities, and compute medical NE Recall, medical WER, and character error rate. Our results show that fine-tuning on accented clinical speech improves medical WER by a wide margin (25-34 % relative), improving their practical applicability in healthcare environments.