Dysarthria Normalization via Local Lie Group Transformations for Robust ASR
This addresses the challenge of robust ASR for individuals with dysarthria, offering a zero-shot method with consistent gains, though it is incremental as it builds on geometric warping techniques.
The paper tackled the problem of improving automatic speech recognition (ASR) for dysarthric speech by normalizing distortions using local Lie group transformations, resulting in up to a 17 percentage-point reduction in word error rate (WER) on challenging TORGO utterances and a 16% drop in WER variance.
We present a geometry-driven method for normalizing dysarthric speech by modeling time, frequency, and amplitude distortions as smooth, local Lie group transformations of spectrograms. Scalar fields generate these deformations via exponential maps, and a neural network is trained - using only synthetically warped healthy speech - to infer the fields and apply an approximate inverse at test time. We introduce a spontaneous-symmetry-breaking (SSB) potential that encourages the model to discover non-trivial field configurations. On real pathological speech, the system delivers consistent gains: up to 17 percentage-point WER reduction on challenging TORGO utterances and a 16 percent drop in WER variance, with no degradation on clean CommonVoice data. Character and phoneme error rates improve in parallel, confirming linguistic relevance. Our results demonstrate that geometrically structured warping provides consistent, zero-shot robustness gains for dysarthric ASR.