Self-supervised Adversarial Training
This work addresses adversarial vulnerability in neural networks, which is a critical security issue for AI systems, but it is incremental as it builds on existing adversarial training and self-supervised learning methods.
The paper tackles the problem of neural networks being vulnerable to adversarial examples by introducing self-supervised learning to improve robustness, resulting in self-supervised representations outperforming supervised versions in robustness and self-supervised adversarial training further enhancing defense ability efficiently.
Recent work has demonstrated that neural networks are vulnerable to adversarial examples. To escape from the predicament, many works try to harden the model in various ways, in which adversarial training is an effective way which learns robust feature representation so as to resist adversarial attacks. Meanwhile, the self-supervised learning aims to learn robust and semantic embedding from data itself. With these views, we introduce self-supervised learning to against adversarial examples in this paper. Specifically, the self-supervised representation coupled with k-Nearest Neighbour is proposed for classification. To further strengthen the defense ability, self-supervised adversarial training is proposed, which maximizes the mutual information between the representations of original examples and the corresponding adversarial examples. Experimental results show that the self-supervised representation outperforms its supervised version in respect of robustness and self-supervised adversarial training can further improve the defense ability efficiently.