Sequential Randomized Smoothing for Adversarially Robust Speech Recognition
This addresses adversarial robustness for speech recognition systems, offering a domain-specific incremental improvement over existing defenses.
The paper tackled the problem of defending Automatic Speech Recognition (ASR) systems against adversarial attacks by adapting the Randomized Smoothing paradigm to handle sequential outputs, resulting in a robust model that resists inaudible noise attacks and requires high distortion to break.
While Automatic Speech Recognition has been shown to be vulnerable to adversarial attacks, defenses against these attacks are still lagging. Existing, naive defenses can be partially broken with an adaptive attack. In classification tasks, the Randomized Smoothing paradigm has been shown to be effective at defending models. However, it is difficult to apply this paradigm to ASR tasks, due to their complexity and the sequential nature of their outputs. Our paper overcomes some of these challenges by leveraging speech-specific tools like enhancement and ROVER voting to design an ASR model that is robust to perturbations. We apply adaptive versions of state-of-the-art attacks, such as the Imperceptible ASR attack, to our model, and show that our strongest defense is robust to all attacks that use inaudible noise, and can only be broken with very high distortion.