LGAICRFeb 15, 2024

Universal Black-Box Reward Poisoning Attack against Offline Reinforcement Learning

arXiv:2402.09695v24 citationsh-index: 4
AI Analysis

This addresses security vulnerabilities in offline RL systems, posing a threat to their deployment in real-world applications, and is novel as the first such attack in this setting.

The paper tackles the problem of reward poisoning attacks in offline reinforcement learning by proposing a universal black-box attack strategy called 'policy contrast attack', which successfully manipulates state-of-the-art algorithms across various datasets.

We study the problem of universal black-boxed reward poisoning attacks against general offline reinforcement learning with deep neural networks. We consider a black-box threat model where the attacker is entirely oblivious to the learning algorithm, and its budget is limited by constraining the amount of corruption at each data point and the total perturbation. We require the attack to be universally efficient against any efficient algorithms that might be used by the agent. We propose an attack strategy called the `policy contrast attack.' The idea is to find low- and high-performing policies covered by the dataset and make them appear to be high- and low-performing to the agent, respectively. To the best of our knowledge, we propose the first universal black-box reward poisoning attack in the general offline RL setting. We provide theoretical insights on the attack design and empirically show that our attack is efficient against current state-of-the-art offline RL algorithms in different learning datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes