Robust Deep Reinforcement Learning against Adversarial Behavior Manipulation
This addresses security vulnerabilities in reinforcement learning systems against adversarial manipulation, offering both attack and defense innovations, though it is incremental in building on existing behavior-targeted attack research.
This study tackles behavior-targeted attacks on reinforcement learning by proposing a novel attack method using imitation learning that works with limited policy access and is environment-agnostic, and a defense strategy called time-discounted regularization that enhances robustness while maintaining task performance, with theoretical analysis showing policy sensitivity impacts defense early in trajectories.
This study investigates behavior-targeted attacks on reinforcement learning and their countermeasures. Behavior-targeted attacks aim to manipulate the victim's behavior as desired by the adversary through adversarial interventions in state observations. Existing behavior-targeted attacks have some limitations, such as requiring white-box access to the victim's policy. To address this, we propose a novel attack method using imitation learning from adversarial demonstrations, which works under limited access to the victim's policy and is environment-agnostic. In addition, our theoretical analysis proves that the policy's sensitivity to state changes impacts defense performance, particularly in the early stages of the trajectory. Based on this insight, we propose time-discounted regularization, which enhances robustness against attacks while maintaining task performance. To the best of our knowledge, this is the first defense strategy specifically designed for behavior-targeted attacks.