HyperNetworks with statistical filtering for defending adversarial examples
This work addresses the problem of adversarial robustness for convolutional neural networks in image classification, offering a method that reduces computational complexity compared to existing defenses, though it appears incremental in nature.
The paper tackles the vulnerability of deep learning to adversarial attacks in image classification by proposing a HyperNetwork method that uses statistical filtering to compute dynamic convolution kernels, resulting in improved robustness without additional detection algorithms, as demonstrated empirically against various attack types.
Deep learning algorithms have been known to be vulnerable to adversarial perturbations in various tasks such as image classification. This problem was addressed by employing several defense methods for detection and rejection of particular types of attacks. However, training and manipulating networks according to particular defense schemes increases computational complexity of the learning algorithms. In this work, we propose a simple yet effective method to improve robustness of convolutional neural networks (CNNs) to adversarial attacks by using data dependent adaptive convolution kernels. To this end, we propose a new type of HyperNetwork in order to employ statistical properties of input data and features for computation of statistical adaptive maps. Then, we filter convolution weights of CNNs with the learned statistical maps to compute dynamic kernels. Thereby, weights and kernels are collectively optimized for learning of image classification models robust to adversarial attacks without employment of additional target detection and rejection algorithms. We empirically demonstrate that the proposed method enables CNNs to spontaneously defend against different types of attacks, e.g. attacks generated by Gaussian noise, fast gradient sign methods (Goodfellow et al., 2014) and a black-box attack(Narodytska & Kasiviswanathan, 2016).