LGCRMLFeb 8, 2017

Adversarial Attacks on Neural Network Policies

arXiv:1702.02284v1938 citations
Originality Incremental advance
AI Analysis

This work highlights a vulnerability in reinforcement learning systems, which could impact their deployment in safety-critical applications.

The paper demonstrates that adversarial attacks, previously studied in computer vision, can significantly degrade the test-time performance of neural network policies in reinforcement learning, with performance drops observed even with small perturbations.

Machine learning classifiers are known to be vulnerable to inputs maliciously constructed by adversaries to force misclassification. Such adversarial examples have been extensively studied in the context of computer vision applications. In this work, we show adversarial attacks are also effective when targeting neural network policies in reinforcement learning. Specifically, we show existing adversarial example crafting techniques can be used to significantly degrade test-time performance of trained policies. Our threat model considers adversaries capable of introducing small perturbations to the raw input of the policy. We characterize the degree of vulnerability across tasks and training algorithms, for a subclass of adversarial-example attacks in white-box and black-box settings. Regardless of the learned task or training algorithm, we observe a significant drop in performance, even with small adversarial perturbations that do not interfere with human perception. Videos are available at http://rll.berkeley.edu/adversarial.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes