LGAICRDec 19, 2023

BadRL: Sparse Targeted Backdoor Attack Against Reinforcement Learning

arXiv:2312.12585v133 citationsh-index: 4AAAI
Originality Incremental advance
AI Analysis

This addresses security vulnerabilities in reinforcement learning systems, though it is an incremental improvement over existing backdoor attack methods.

The paper tackles the problem of high attack costs and detectability in backdoor attacks on reinforcement learning by proposing BadRL, a method that uses sparse poisoning (0.003% of training steps) and dynamic triggers to degrade agent performance while remaining stealthy.

Backdoor attacks in reinforcement learning (RL) have previously employed intense attack strategies to ensure attack success. However, these methods suffer from high attack costs and increased detectability. In this work, we propose a novel approach, BadRL, which focuses on conducting highly sparse backdoor poisoning efforts during training and testing while maintaining successful attacks. Our algorithm, BadRL, strategically chooses state observations with high attack values to inject triggers during training and testing, thereby reducing the chances of detection. In contrast to the previous methods that utilize sample-agnostic trigger patterns, BadRL dynamically generates distinct trigger patterns based on targeted state observations, thereby enhancing its effectiveness. Theoretical analysis shows that the targeted backdoor attack is always viable and remains stealthy under specific assumptions. Empirical results on various classic RL tasks illustrate that BadRL can substantially degrade the performance of a victim agent with minimal poisoning efforts 0.003% of total training steps) during training and infrequent attacks during testing.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes