RO LGOct 25, 2023

Multi-Agent Reinforcement Learning-Based UAV Pathfinding for Obstacle Avoidance in Stochastic Environment

Qizhen Wu, Kexin Liu, Lei Chen, Jinhu Lü

arXiv:2310.16659v21.93 citationsh-index: 6Has Code

Originality Incremental advance

AI Analysis

This work addresses obstacle avoidance for UAVs in dynamic settings, offering an incremental improvement over existing methods by enhancing training efficiency and decentralization.

The paper tackles the problem of pathfinding for multiple UAVs in stochastic environments by proposing a centralized training with decentralized execution method based on multi-agent reinforcement learning, enhanced with model predictive control and distance-weighted mean field approaches, which reduces training interactions and shows effectiveness in experiments.

Traditional methods plan feasible paths for multiple agents in the stochastic environment. However, the methods' iterations with the changes in the environment result in computation complexities, especially for the decentralized agents without a centralized planner. Although reinforcement learning provides a plausible solution because of the generalization for different environments, it struggles with enormous agent-environment interactions in training. Here, we propose a novel centralized training with decentralized execution method based on multi-agent reinforcement learning, which is improved based on the idea of model predictive control. In our approach, agents communicate only with the centralized planner to make decentralized decisions online in the stochastic environment. Furthermore, considering the communication constraint with the centralized planner, each agent plans feasible paths through the extended observation, which combines information on neighboring agents based on the distance-weighted mean field approach. Inspired by the rolling optimization approach of model predictive control, we conduct multi-step value convergence in multi-agent reinforcement learning to enhance the training efficiency, which reduces the expensive interactions in convergence. Experiment results in both comparison, ablation, and real-robot studies validate the effectiveness and generalization performance of our method.

View on arXiv PDF Code

Similar