Evaluating and Improving the Robustness of Speech Command Recognition Models to Noise and Distribution Shifts
This work addresses the underexplored issue of out-of-distribution robustness in audio-based models, which is incremental as it builds on prior computer vision research by applying similar concepts to speech recognition.
The study tackled the problem of evaluating and improving the robustness of speech command recognition models to noise and distribution shifts, finding that noise-aware training improves robustness in some configurations, with metrics like Fairness and Robustness used to quantify these effects.
Although prior work in computer vision has shown strong correlations between in-distribution (ID) and out-of-distribution (OOD) accuracies, such relationships remain underexplored in audio-based models. In this study, we investigate how training conditions and input features affect the robustness and generalization abilities of spoken keyword classifiers under OOD conditions. We benchmark several neural architectures across a variety of evaluation sets. To quantify the impact of noise on generalization, we make use of two metrics: Fairness (F), which measures overall accuracy gains compared to a baseline model, and Robustness (R), which assesses the convergence between ID and OOD performance. Our results suggest that noise-aware training improves robustness in some configurations. These findings shed new light on the benefits and limitations of noise-based augmentation for generalization in speech models.