Out-distribution training confers robustness to deep neural networks
This addresses security concerns for critical systems using deep learning, but is incremental as it builds on existing adversarial robustness methods.
The paper tackles the problem of adversarial vulnerability in deep neural networks by linking it to over-generalization, and shows that training on out-distribution samples increases robustness by detecting black-box adversaries and making white-box generation harder.
The easiness at which adversarial instances can be generated in deep neural networks raises some fundamental questions on their functioning and concerns on their use in critical systems. In this paper, we draw a connection between over-generalization and adversaries: a possible cause of adversaries lies in models designed to make decisions all over the input space, leading to inappropriate high-confidence decisions in parts of the input space not represented in the training set. We empirically show an augmented neural network, which is not trained on any types of adversaries, can increase the robustness by detecting black-box one-step adversaries, i.e. assimilated to out-distribution samples, and making generation of white-box one-step adversaries harder.