LGAICRFeb 16, 2021

Reward Poisoning in Reinforcement Learning: Attacks Against Unknown Learners in Unknown Environments

arXiv:2102.08492v146 citations
Originality Highly original
AI Analysis

This addresses security vulnerabilities in RL systems by showing that even with minimal assumptions, adversaries can effectively poison rewards, posing a threat to applications like autonomous systems and robotics.

The paper tackles the problem of black-box reward poisoning attacks in reinforcement learning, where an adversary manipulates rewards to mislead RL agents without prior knowledge of the environment or learner, and demonstrates that their U2 attack achieves near-matching performance to state-of-the-art white-box attacks.

We study black-box reward poisoning attacks against reinforcement learning (RL), in which an adversary aims to manipulate the rewards to mislead a sequence of RL agents with unknown algorithms to learn a nefarious policy in an environment unknown to the adversary a priori. That is, our attack makes minimum assumptions on the prior knowledge of the adversary: it has no initial knowledge of the environment or the learner, and neither does it observe the learner's internal mechanism except for its performed actions. We design a novel black-box attack, U2, that can provably achieve a near-matching performance to the state-of-the-art white-box attack, demonstrating the feasibility of reward poisoning even in the most challenging black-box setting.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes