Universal Adversarial Perturbations for Speech Recognition Systems
This work addresses security vulnerabilities in speech recognition systems, which is critical for applications like voice assistants and transcription services, and is incremental by extending adversarial attack concepts from vision to audio.
The authors demonstrated that universal adversarial perturbations can cause mis-transcription in automatic speech recognition systems, achieving a single quasi-imperceptible perturbation that fools Mozilla DeepSpeech and shows transferability to WaveNet-based systems.
In this work, we demonstrate the existence of universal adversarial audio perturbations that cause mis-transcription of audio signals by automatic speech recognition (ASR) systems. We propose an algorithm to find a single quasi-imperceptible perturbation, which when added to any arbitrary speech signal, will most likely fool the victim speech recognition model. Our experiments demonstrate the application of our proposed technique by crafting audio-agnostic universal perturbations for the state-of-the-art ASR system -- Mozilla DeepSpeech. Additionally, we show that such perturbations generalize to a significant extent across models that are not available during training, by performing a transferability test on a WaveNet based ASR system.