LGFeb 9, 2024

Deceptive Path Planning via Reinforcement Learning with Graph Neural Networks

Michael Y. Fatemi, Wesley A. Suttle, Brian M. Sadler

arXiv:2402.06552v19.27 citationsh-index: 5Has CodeAAMAS

Originality Highly original

AI Analysis

This addresses the challenge of designing paths that hide goals from observers in dynamic environments, offering a more flexible and scalable solution compared to prior problem-specific methods.

The paper tackles the problem of deceptive path planning (DPP) by proposing a reinforcement learning approach with graph neural networks, which overcomes limitations of existing methods and achieves generalization, scalability, tunable deception, and real-time adaptivity without fine-tuning.

Deceptive path planning (DPP) is the problem of designing a path that hides its true goal from an outside observer. Existing methods for DPP rely on unrealistic assumptions, such as global state observability and perfect model knowledge, and are typically problem-specific, meaning that even minor changes to a previously solved problem can force expensive computation of an entirely new solution. Given these drawbacks, such methods do not generalize to unseen problem instances, lack scalability to realistic problem sizes, and preclude both on-the-fly tunability of deception levels and real-time adaptivity to changing environments. In this paper, we propose a reinforcement learning (RL)-based scheme for training policies to perform DPP over arbitrary weighted graphs that overcomes these issues. The core of our approach is the introduction of a local perception model for the agent, a new state space representation distilling the key components of the DPP problem, the use of graph neural network-based policies to facilitate generalization and scaling, and the introduction of new deception bonuses that translate the deception objectives of classical methods to the RL setting. Through extensive experimentation we show that, without additional fine-tuning, at test time the resulting policies successfully generalize, scale, enjoy tunable levels of deception, and adapt in real-time to changes in the environment.

View on arXiv PDF Code

Similar