ROAILGJun 12, 2024

Hierarchical Reinforcement Learning for Swarm Confrontation with High Uncertainty

arXiv:2406.07877v211 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of swarm robotics in pursuit-evasion games, offering a solution for scenarios with dynamic obstacles and unknown opponent strategies, though it is incremental in nature.

The paper tackles the problem of swarm confrontation under high uncertainty by proposing a hierarchical reinforcement learning approach that decouples hybrid decision processes into discrete allocation and continuous planning layers, achieving a win rate of around 90% in experiments with 20 to 40 agents.

In swarm robotics, confrontation including the pursuit-evasion game is a key scenario. High uncertainty caused by unknown opponents' strategies, dynamic obstacles, and insufficient training complicates the action space into a hybrid decision process. Although the deep reinforcement learning method is significant for swarm confrontation since it can handle various sizes, as an end-to-end implementation, it cannot deal with the hybrid process. Here, we propose a novel hierarchical reinforcement learning approach consisting of a target allocation layer, a path planning layer, and the underlying dynamic interaction mechanism between the two layers, which indicates the quantified uncertainty. It decouples the hybrid process into discrete allocation and continuous planning layers, with a probabilistic ensemble model to quantify the uncertainty and regulate the interaction frequency adaptively. Furthermore, to overcome the unstable training process introduced by the two layers, we design an integration training method including pre-training and cross-training, which enhances the training efficiency and stability. Experiment results in both comparison, ablation, and real-robot studies validate the effectiveness and generalization performance of our proposed approach. In our defined experiments with twenty to forty agents, the win rate of the proposed method reaches around ninety percent, outperforming other traditional methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes