AIMay 8

Repeated Deceptive Path Planning against Learnable Observer

Shiyue Cao, Pei Xu, Likun Yang, Lei Cui, Shizhao Yu, Shiyu Zhang, Yongjian Ren, Xiaotang Chen, Kaiqi Huang

arXiv:2605.0717417.2

Predicted impact top 50% in AI · last 90 daysOriginality Highly original

AI Analysis

This work addresses the problem of deceptive path planning against adaptive, learning observers, which is critical for real-world applications like military operations and critical goods transportation.

The paper introduces Repeated Deceptive Path Planning (RDPP) to model learnable observers and proposes Deceptive Meta Planning (DeMP), which uses meta-learning to sustain deception across repeated interactions. DeMP significantly outperforms existing methods in deception while maintaining competitive path cost.

We study the problem of deceptive path planning (DPP), where an agent aims to conceal its true destination from external observers. While existing work assumes static, non-learning observers, real-world adversaries-such as in critical goods transportation or military operations-can adapt by learning from historical trajectories. To address this gap, we introduce Repeated Deceptive Path Planning (RDPP), a new formulation that explicitly models learnable observers. We show that existing DPP methods fail under this setting, as they cannot adapt to evolving adversarial predictions. While incorporating observer previous predictions into updates enables some adaptation, such incremental updates cause accumulative lag that degrades deception. To this end, we propose Deceptive Meta Planning (DeMP), a two-level optimization framework that combines episode-level adaptation, which enables short-term policy adjustment to counter updated observer, and meta-level updates, which leverage cross-episode feedback to capture how observers update their models and accelerate adaptation in future episodes. In this way, DeMP mitigates the accumulation of adaptation lag, enabling sustained deception against a learning observer. Experiments across environments demonstrate that DeMP significantly outperforms existing approaches in RDPP while maintaining competitive path cost. Our results highlight the importance of modeling repeated interactions with learnable adversaries, providing new insights into deception and privacy in multi-agent systems.

View on arXiv PDF

Similar