Guoxiang Zhao

ROMar 16, 2025Code

TERL: Large-Scale Multi-Target Encirclement Using Transformer-Enhanced Reinforcement Learning

Heng Zhang, Guoxiang Zhao, Xiaoqiang Ren

Pursuit-evasion (PE) problem is a critical challenge in multi-robot systems (MRS). While reinforcement learning (RL) has shown its promise in addressing PE tasks, research has primarily focused on single-target pursuit, with limited exploration of multi-target encirclement, particularly in large-scale settings. This paper proposes a Transformer-Enhanced Reinforcement Learning (TERL) framework for large-scale multi-target encirclement. By integrating a transformer-based policy network with target selection, TERL enables robots to adaptively prioritize targets and safely coordinate robots. Results show that TERL outperforms existing RL-based methods in terms of encirclement success rate and task completion time, while maintaining good performance in large-scale scenarios. Notably, TERL, trained on small-scale scenarios (15 pursuers, 4 targets), generalizes effectively to large-scale settings (80 pursuers, 20 targets) without retraining, achieving a 100% success rate. The code and demonstration video are available at https://github.com/ApricityZ/TERL.

OCFeb 25, 2018

Pareto optimal multi-robot motion planning

Guoxiang Zhao, Minghui Zhu

This paper studies a class of multi-robot coordination problems where a team of robots aim to reach their goal regions with minimum time and avoid collisions with obstacles and other robots. A novel numerical algorithm is proposed to identify the Pareto optimal solutions where no robot can unilaterally reduce its traveling time without extending others'. The consistent approximation of the algorithm in the epigraphical profile sense is guaranteed using set-valued numerical analysis. Experiments on an indoor multi-robot platform and computer simulations show the anytime property of the proposed algorithm; i.e., it is able to quickly return a feasible control policy that safely steers the robots to their goal regions and it keeps improving policy optimality if more time is given.

Guoxiang Zhao

2 Papers