LG IT MLJul 28, 2020

Derivation of Information-Theoretically Optimal Adversarial Attacks with Applications to Robust Machine Learning

arXiv:2007.14042v15.83 citations

Originality Incremental advance

AI Analysis

This work addresses the fundamental issue of adversarial vulnerability in machine learning classifiers, offering theoretical insights that are incremental to existing research on adversarial examples.

The paper tackles the problem of designing optimal adversarial attacks that degrade a decision system's performance by minimizing mutual information between the degraded signal and the label, showing that adversarial vulnerability is unavoidable under certain conditions and providing theoretical derivations for discrete and continuous signals.

We consider the theoretical problem of designing an optimal adversarial attack on a decision system that maximally degrades the achievable performance of the system as measured by the mutual information between the degraded signal and the label of interest. This problem is motivated by the existence of adversarial examples for machine learning classifiers. By adopting an information theoretic perspective, we seek to identify conditions under which adversarial vulnerability is unavoidable i.e. even optimally designed classifiers will be vulnerable to small adversarial perturbations. We present derivations of the optimal adversarial attacks for discrete and continuous signals of interest, i.e., finding the optimal perturbation distributions to minimize the mutual information between the degraded signal and a signal following a continuous or discrete distribution. In addition, we show that it is much harder to achieve adversarial attacks for minimizing mutual information when multiple redundant copies of the input signal are available. This provides additional support to the recently proposed ``feature compression" hypothesis as an explanation for the adversarial vulnerability of deep learning classifiers. We also report on results from computational experiments to illustrate our theoretical results.

View on arXiv PDF

Similar