LG CRMay 22, 2025

Backdoors in DRL: Four Environments Focusing on In-distribution Triggers

Chace Ashcraft, Ted Staley, Josh Carney, Cameron Hickert, Kiran Karra, Nathan Drenkow

arXiv:2505.17248v21 citationsh-index: 8Has Code

Originality Incremental advance

AI Analysis

This addresses security risks for users of open-source DRL models, but it is incremental as it builds on existing backdoor research by focusing on in-distribution triggers.

The paper tackles the problem of backdoor attacks in deep reinforcement learning by developing in-distribution triggers, which are more threatening due to ease of activation, and finds that these attacks are viable threats even with basic data poisoning methods across four RL environments.

Backdoor attacks, or trojans, pose a security risk by concealing undesirable behavior in deep neural network models. Open-source neural networks are downloaded from the internet daily, possibly containing backdoors, and third-party model developers are common. To advance research on backdoor attack mitigation, we develop several trojans for deep reinforcement learning (DRL) agents. We focus on in-distribution triggers, which occur within the agent's natural data distribution, since they pose a more significant security threat than out-of-distribution triggers due to their ease of activation by the attacker during model deployment. We implement backdoor attacks in four reinforcement learning (RL) environments: LavaWorld, Randomized LavaWorld, Colorful Memory, and Modified Safety Gymnasium. We train various models, both clean and backdoored, to characterize these attacks. We find that in-distribution triggers can require additional effort to implement and be more challenging for models to learn, but are nevertheless viable threats in DRL even using basic data poisoning attacks.

View on arXiv PDF

Similar