Towards Explaining Adversarial Examples Phenomenon in Artificial Neural Networks
This work addresses the fundamental issue of adversarial robustness in neural networks for AI security applications, but it appears incremental as it builds upon and unifies existing proposals in the literature.
The paper tackles the problem of explaining why adversarial examples exist in artificial neural networks by proposing that pointwise convergence can explain both adversarial examples and adversarial training, relating evasion attacks and adversarial training to established learning theory concepts.
In this paper, we study the adversarial examples existence and adversarial training from the standpoint of convergence and provide evidence that pointwise convergence in ANNs can explain these observations. The main contribution of our proposal is that it relates the objective of the evasion attacks and adversarial training with concepts already defined in learning theory. Also, we extend and unify some of the other proposals in the literature and provide alternative explanations on the observations made in those proposals. Through different experiments, we demonstrate that the framework is valuable in the study of the phenomenon and is applicable to real-world problems.