Adversarial Black-Box Attacks on Automatic Speech Recognition Systems using Multi-Objective Evolutionary Optimization
This work addresses security risks in ASR systems for applications like voice assistants, though it is incremental as it builds on existing adversarial attack methods.
The paper tackles the vulnerability of Automatic Speech Recognition (ASR) systems to adversarial attacks by proposing a framework using multi-objective evolutionary optimization for black-box attacks, resulting in up to a 980% increase in Word Error Rates (WER) on systems like Deepspeech and Kaldi-ASR while maintaining high acoustic similarity.
Fooling deep neural networks with adversarial input have exposed a significant vulnerability in the current state-of-the-art systems in multiple domains. Both black-box and white-box approaches have been used to either replicate the model itself or to craft examples which cause the model to fail. In this work, we propose a framework which uses multi-objective evolutionary optimization to perform both targeted and un-targeted black-box attacks on Automatic Speech Recognition (ASR) systems. We apply this framework on two ASR systems: Deepspeech and Kaldi-ASR, which increases the Word Error Rates (WER) of these systems by upto 980%, indicating the potency of our approach. During both un-targeted and targeted attacks, the adversarial samples maintain a high acoustic similarity of 0.98 and 0.97 with the original audio.