RO LGJun 3, 2025

Learned Controllers for Agile Quadrotors in Pursuit-Evasion Games

Alejandro Sanchez Roncero, Yixi Cai, Olov Andersson, Petter Ogren

arXiv:2506.02849v23.2h-index: 1

Originality Incremental advance

AI Analysis

This work addresses the problem of training stable and effective controllers for agile drone pursuit-evasion games, which is incremental as it builds on existing RL methods with specific algorithmic improvements.

The paper tackled agile quadrotor pursuit-evasion by developing an Asynchronous Multi-Stage Population-Based algorithm to address non-stationarity and catastrophic forgetting in reinforcement learning, resulting in policies that outperform baselines, achieve more agile flight with body-rate controllers, and generalize across arena sizes.

We address the problem of agile 1v1 quadrotor pursuit-evasion, where a pursuer and an evader learn to outmaneuver each other through reinforcement learning (RL). Such settings face two major challenges: non-stationarity, since each agent's evolving policy alters the environment dynamics and destabilizes training, and catastrophic forgetting, where a policy overfits to the current adversary and loses effectiveness against previously encountered strategies. To tackle these issues, we propose an Asynchronous Multi-Stage Population-Based (AMSPB) algorithm. At each stage, the pursuer and evader are trained asynchronously against a frozen pool of opponents sampled from a growing population of past and current policies, stabilizing training and ensuring exposure to diverse behaviors. Within this framework, we train neural network controllers that output either velocity commands or body rates with collective thrust. Experiments in a high-fidelity simulator show that: (i) AMSPB-trained RL policies outperform RL and geometric baselines; (ii) body-rate-and-thrust controllers achieve more agile flight than velocity-based controllers, leading to better pursuit-evasion performance; (iii) AMSPB yields stable, monotonic gains across stages; and (iv) trained policies in one arena size generalize fairly well to other sizes without retraining.

View on arXiv PDF

Similar