Almost for Free: Crafting Adversarial Examples with Convolutional Image Filters
This work provides a computationally efficient and transferable attack for deceiving neural networks, highlighting their vulnerability to simple perturbations.
The authors propose a simple method to craft adversarial examples using convolutional image filters based on edge detection, achieving 30-80% success rates on neural networks with 3x3 filters and reducing parameters by five orders of magnitude compared to generative models.
Adversarial examples in machine learning are typically generated using gradients, obtained either directly through access to the model or approximated via queries to it. In this paper, we propose a much simpler approach to craft adversarial examples, drawing inspiration from insights of explainable machine learning. In particular, we design \emph{adversarial image filters} that are based on classic edge detection algorithms but optimized to deceive learning models. The resulting untargeted attacks are transferable and require only a single pass over the input. Empirically, we find that 3x3 filters already enable success rates between 30% and 80% on different neural networks. Compared to related approaches using generative models for crafting adversarial examples, we reduce the number of parameters by five orders of magnitude, resulting in a very efficient attack. When investigating the parameters of the learned filters, we observe interesting properties such as a high transferability between models and structures common to classic image filters. Our results provide further insights into the vulnerability of neural networks and their fragility to malicious noise.