LGCRMLMay 29, 2019

CopyCAT: Taking Control of Neural Policies with Constant Attacks

arXiv:1905.12282v234 citations
Originality Incremental advance
AI Analysis

This addresses security vulnerabilities in reinforcement learning systems for applications like autonomous agents, though it is incremental by focusing on a novel read-only attack scenario.

The paper tackles the problem of adversarial attacks on deep reinforcement learning agents by introducing CopyCAT, a targeted attack that lures agents into following an outsider's policy in a read-only setting, achieving effectiveness on Atari 2600 games with pre-computed, fast inference for real-time use.

We propose a new perspective on adversarial attacks against deep reinforcement learning agents. Our main contribution is CopyCAT, a targeted attack able to consistently lure an agent into following an outsider's policy. It is pre-computed, therefore fast inferred, and could thus be usable in a real-time scenario. We show its effectiveness on Atari 2600 games in the novel read-only setting. In this setting, the adversary cannot directly modify the agent's state -- its representation of the environment -- but can only attack the agent's observation -- its perception of the environment. Directly modifying the agent's state would require a write-access to the agent's inner workings and we argue that this assumption is too strong in realistic settings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes