LGAICRJan 3, 2022

Execute Order 66: Targeted Data Poisoning for Reinforcement Learning

arXiv:2201.00762v210 citations
AI Analysis

This addresses security vulnerabilities in reinforcement learning systems, though it is incremental as it adapts an existing technique to a new domain.

The paper tackles the problem of targeted data poisoning in reinforcement learning by introducing an attack that causes agent misbehavior at specific states with minimal modifications to training observations, achieving success in two Atari games.

Data poisoning for reinforcement learning has historically focused on general performance degradation, and targeted attacks have been successful via perturbations that involve control of the victim's policy and rewards. We introduce an insidious poisoning attack for reinforcement learning which causes agent misbehavior only at specific target states - all while minimally modifying a small fraction of training observations without assuming any control over policy or reward. We accomplish this by adapting a recent technique, gradient alignment, to reinforcement learning. We test our method and demonstrate success in two Atari games of varying difficulty.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes