Synthesis and Deployment of Maximal Robust Control Barrier Functions through Adversarial Reinforcement Learning

Donggeon David Oh, Duy P. Nguyen, Haimin Hu, Jaime Fernández Fisac

arXiv:2604.1319299.5h-index: 7

AI Analysis

For safety-critical control of nonlinear systems with black-box dynamics and unknown uncertainty, this work provides a principled method to enforce safety on the maximal robust safe set.

This paper introduces a robust control barrier function framework for general nonlinear systems under bounded uncertainty, using a Q-function from reinforcement learning to synthesize and deploy maximal robust safe sets. The method achieves less conservative safe sets than baselines on an inverted pendulum and reliable safety enforcement on a 36-D quadruped simulator.

Robust control barrier functions (CBFs) provide a principled mechanism for smooth safety enforcement under worst-case disturbances. However, existing approaches typically rely on explicit, closed-form structure in the dynamics (e.g., control-affine) and uncertainty models. This has led to limited scalability and generality, with most robust CBFs certifying only conservative subsets of the maximal robust safe set. In this paper, we introduce a new robust CBF framework for general nonlinear systems under bounded uncertainty. We first show that the safety value function solving the dynamic programming Isaacs equation is a valid robust discrete-time CBF that enforces safety on the maximal robust safe set. We then adopt the key reinforcement learning (RL) notion of quality function (or Q-function), which removes the need for explicit dynamics by lifting the barrier certificate into state-action space and yields a novel robust Q-CBF constraint for safety filtering. Combined with adversarial RL, this enables the synthesis and deployment of robust Q-CBFs on general nonlinear systems with black-box dynamics and unknown uncertainty structure. We validate the framework on a canonical inverted pendulum benchmark and a 36-D quadruped simulator, achieving substantially less conservative safe sets than barrier-based baselines on the pendulum and reliable safety enforcement even under adversarial uncertainty realizations on the quadruped.

View on arXiv PDF

Similar