Robustness Against Adversarial Attacks via Learning Confined Adversarial Polytopes
This addresses the critical issue of adversarial robustness in deep learning, which is essential for security in real-world applications, though it appears incremental as it builds on existing adversarial training methods.
The paper tackles the problem of deep neural networks being vulnerable to adversarial attacks by training them to have confined adversarial polytopes that avoid decision boundaries, resulting in improved robustness against state-of-the-art attacks like AutoAttack.
Deep neural networks (DNNs) could be deceived by generating human-imperceptible perturbations of clean samples. Therefore, enhancing the robustness of DNNs against adversarial attacks is a crucial task. In this paper, we aim to train robust DNNs by limiting the set of outputs reachable via a norm-bounded perturbation added to a clean sample. We refer to this set as adversarial polytope, and each clean sample has a respective adversarial polytope. Indeed, if the respective polytopes for all the samples are compact such that they do not intersect the decision boundaries of the DNN, then the DNN is robust against adversarial samples. Hence, the inner-working of our algorithm is based on learning \textbf{c}onfined \textbf{a}dversarial \textbf{p}olytopes (CAP). By conducting a thorough set of experiments, we demonstrate the effectiveness of CAP over existing adversarial robustness methods in improving the robustness of models against state-of-the-art attacks including AutoAttack.