LGJun 15, 2025

TrojanTO: Action-Level Backdoor Attacks against Trajectory Optimization Models

arXiv:2506.12815v11 citationsh-index: 7
Originality Highly original
AI Analysis

This addresses a security problem for users of offline reinforcement learning models, representing a novel attack approach rather than an incremental improvement.

The paper tackles the vulnerability of Trajectory Optimization models to backdoor attacks by proposing TrojanTO, an action-level attack method that achieves effective implantation with a low attack budget of 0.3% of trajectories across diverse tasks and model architectures.

Recent advances in Trajectory Optimization (TO) models have achieved remarkable success in offline reinforcement learning. However, their vulnerabilities against backdoor attacks are poorly understood. We find that existing backdoor attacks in reinforcement learning are based on reward manipulation, which are largely ineffective against the TO model due to its inherent sequence modeling nature. Moreover, the complexities introduced by high-dimensional action spaces further compound the challenge of action manipulation. To address these gaps, we propose TrojanTO, the first action-level backdoor attack against TO models. TrojanTO employs alternating training to enhance the connection between triggers and target actions for attack effectiveness. To improve attack stealth, it utilizes precise poisoning via trajectory filtering for normal performance and batch poisoning for trigger consistency. Extensive evaluations demonstrate that TrojanTO effectively implants backdoor attacks across diverse tasks and attack objectives with a low attack budget (0.3\% of trajectories). Furthermore, TrojanTO exhibits broad applicability to DT, GDT, and DC, underscoring its scalability across diverse TO model architectures.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes