Universal adversarial examples in speech command classification
This work addresses the problem of adversarial robustness in audio-based machine learning for speech command systems, representing an incremental advance by extending known image-domain concepts to audio.
The paper tackles the existence of universal adversarial perturbations for speech command classification, providing evidence that such attacks can generalize across different models, and proposes a novel analytical framework to evaluate perturbations under varying universality levels, showing feasibility decreases with higher universality.
Adversarial examples are inputs intentionally perturbed with the aim of forcing a machine learning model to produce a wrong prediction, while the changes are not easily detectable by a human. Although this topic has been intensively studied in the image domain, classification tasks in the audio domain have received less attention. In this paper we address the existence of universal perturbations for speech command classification. We provide evidence that universal attacks can be generated for speech command classification tasks, which are able to generalize across different models to a significant extent. Additionally, a novel analytical framework is proposed for the evaluation of universal perturbations under different levels of universality, demonstrating that the feasibility of generating effective perturbations decreases as the universality level increases. Finally, we propose a more detailed and rigorous framework to measure the amount of distortion introduced by the perturbations, demonstrating that the methods employed by convention are not realistic in audio-based problems.