CVAILGMLOct 26, 2016

Universal adversarial perturbations

arXiv:1610.08401v32781 citations
Originality Highly original
AI Analysis

This reveals potential security breaches in AI systems, as adversaries could exploit single input directions to break classifiers on most images, highlighting a foundational vulnerability in machine learning.

The paper tackled the problem of deep neural network classifiers being vulnerable to small, universal adversarial perturbations that cause misclassification of natural images with high probability, showing that such perturbations exist and generalize well across networks.

Given a state-of-the-art deep neural network classifier, we show the existence of a universal (image-agnostic) and very small perturbation vector that causes natural images to be misclassified with high probability. We propose a systematic algorithm for computing universal perturbations, and show that state-of-the-art deep neural networks are highly vulnerable to such perturbations, albeit being quasi-imperceptible to the human eye. We further empirically analyze these universal perturbations and show, in particular, that they generalize very well across neural networks. The surprising existence of universal perturbations reveals important geometric correlations among the high-dimensional decision boundary of classifiers. It further outlines potential security breaches with the existence of single directions in the input space that adversaries can possibly exploit to break a classifier on most natural images.

Code Implementations8 repos

Data from Papers with Code (CC-BY-SA-4.0)

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes