Poisoning Deep Reinforcement Learning Agents with In-Distribution Triggers
This addresses security vulnerabilities in deep learning models, particularly for reinforcement learning applications, but is incremental as it builds on existing poisoning and multi-task learning methods.
The paper tackles the problem of data poisoning attacks on deep reinforcement learning agents by introducing in-distribution triggers, achieving successful attacks in three common environments.
In this paper, we propose a new data poisoning attack and apply it to deep reinforcement learning agents. Our attack centers on what we call in-distribution triggers, which are triggers native to the data distributions the model will be trained on and deployed in. We outline a simple procedure for embedding these, and other, triggers in deep reinforcement learning agents following a multi-task learning paradigm, and demonstrate in three common reinforcement learning environments. We believe that this work has important implications for the security of deep learning models.