LGMay 10

Plan2Cleanse: Test-Time Backdoor Defense via Monte-Carlo Planning in Deep Reinforcement Learning

arXiv:2605.0963853.9Has Code
Predicted impact top 44% in LG · last 90 daysOriginality Highly original
AI Analysis

This work provides a test-time defense against backdoor attacks in RL, addressing a critical security concern for third-party trained models deployed in real-world systems.

Plan2Cleanse uses Monte Carlo Tree Search to detect and neutralize backdoor attacks in RL models at test time, achieving over 61.4 percentage point improvement in trigger detection in O-RAN scenarios and increasing win rates from 35% to 53% in Humanoid environments.

Ensuring the security of reinforcement learning (RL) models is critical, particularly when they are trained by third parties and deployed in real-world systems. Attackers can implant backdoors into these models, causing them to behave normally under typical conditions, but execute malicious behaviors when specific triggers are activated. In this work, we propose Plan2Cleanse, a test-time detection and mitigation framework that adapts Monte Carlo Tree Search to efficiently identify and neutralize RL backdoor attacks without requiring model retraining. Our approach recasts backdoor detection as a planning problem, enabling systematic exploration of temporally extended trigger sequences while maintaining black-box access to the target policy. By leveraging the detection results, Plan2Cleanse can further achieve efficient mitigation through tree-search preventive replanning. We evaluated our method in competitive MuJoCo environments, simulated O-RAN wireless networks, and Atari games. Plan2Cleanse achieves substantial improvements, increasing trigger detection success rates by more than 61.4 percentage points in stealthy O-RAN scenarios and improving win rates from 35\% to 53\% in competitive Humanoid environments. These results demonstrate the effectiveness of our test-time defense approach and highlight the importance of proactive defenses against backdoor threats in RL deployments. Our implementation is publicly available at https://github.com/rl-bandits-lab/RL-Backdoor.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes