LG CR CV IT MLOct 23, 2018

One Bit Matters: Understanding Adversarial Examples as the Abuse of Redundancy

Jingkang Wang, Ruoxi Jia, Gerald Friedland, Bo Li, Costas Spanos

arXiv:1810.09650v13.54 citations

Originality Highly original

AI Analysis

This addresses the trustworthiness issue in ML for practitioners by providing a theoretical foundation to understand and mitigate adversarial attacks, though it is incremental in building on prior anecdotal studies.

The paper tackles the problem of adversarial examples in machine learning by proposing an information-theoretic model that explains them as the abuse of feature redundancies, proving redundancy is necessary for their existence and showing adversarial examples introduce just enough redundancy to overflow model decision-making.

Despite the great success achieved in machine learning (ML), adversarial examples have caused concerns with regards to its trustworthiness: A small perturbation of an input results in an arbitrary failure of an otherwise seemingly well-trained ML model. While studies are being conducted to discover the intrinsic properties of adversarial examples, such as their transferability and universality, there is insufficient theoretic analysis to help understand the phenomenon in a way that can influence the design process of ML experiments. In this paper, we deduce an information-theoretic model which explains adversarial attacks as the abuse of feature redundancies in ML algorithms. We prove that feature redundancy is a necessary condition for the existence of adversarial examples. Our model helps to explain some major questions raised in many anecdotal studies on adversarial examples. Our theory is backed up by empirical measurements of the information content of benign and adversarial examples on both image and text datasets. Our measurements show that typical adversarial examples introduce just enough redundancy to overflow the decision making of an ML model trained on corresponding benign examples. We conclude with actionable recommendations to improve the robustness of machine learners against adversarial examples.

View on arXiv PDF

Similar