RO LG MA OCFeb 5, 2025

Discrete GCBF Proximal Policy Optimization for Multi-agent Safe Optimal Control

Songyuan Zhang, Oswin So, Mitchell Black, Chuchu Fan

MIT

arXiv:2502.03640v314.511 citationsh-index: 23Has CodeICLR

Originality Incremental advance

AI Analysis

This addresses the problem of ensuring safety and optimality in multi-agent systems for applications like robotics or autonomous vehicles, but it is incremental as it builds on existing control barrier function and reinforcement learning methods.

The paper tackled the challenge of designing safe and high-performance control policies for multi-agent systems with unknown dynamics and constraints by proposing DGPPO, a framework that simultaneously learns a discrete graph control barrier function and a distributed policy, achieving safety rates matching conservative baselines and task performance matching unsafe baselines across multiple simulation environments.

Control policies that can achieve high task performance and satisfy safety constraints are desirable for any system, including multi-agent systems (MAS). One promising technique for ensuring the safety of MAS is distributed control barrier functions (CBF). However, it is difficult to design distributed CBF-based policies for MAS that can tackle unknown discrete-time dynamics, partial observability, changing neighborhoods, and input constraints, especially when a distributed high-performance nominal policy that can achieve the task is unavailable. To tackle these challenges, we propose DGPPO, a new framework that simultaneously learns both a discrete graph CBF which handles neighborhood changes and input constraints, and a distributed high-performance safe policy for MAS with unknown discrete-time dynamics. We empirically validate our claims on a suite of multi-agent tasks spanning three different simulation engines. The results suggest that, compared with existing methods, our DGPPO framework obtains policies that achieve high task performance (matching baselines that ignore the safety constraints), and high safety rates (matching the most conservative baselines), with a constant set of hyperparameters across all environments.

View on arXiv PDF Code

Similar