ROAILGDec 17, 2024

Physics-model-guided Worst-case Sampling for Safe Reinforcement Learning

arXiv:2412.13224v1h-index: 3
Originality Incremental advance
AI Analysis

This work addresses safety issues in learning-enabled cyber-physical systems, offering an incremental improvement in sampling efficiency for training safe policies.

The paper tackles the problem of safety-critical corner cases in deep reinforcement learning for cyber-physical systems by proposing a physics-model-guided worst-case sampling strategy, resulting in remarkably improved sampling efficiency and more robust safe policies across simulated and real robots.

Real-world accidents in learning-enabled CPS frequently occur in challenging corner cases. During the training of deep reinforcement learning (DRL) policy, the standard setup for training conditions is either fixed at a single initial condition or uniformly sampled from the admissible state space. This setup often overlooks the challenging but safety-critical corner cases. To bridge this gap, this paper proposes a physics-model-guided worst-case sampling strategy for training safe policies that can handle safety-critical cases toward guaranteed safety. Furthermore, we integrate the proposed worst-case sampling strategy into the physics-regulated deep reinforcement learning (Phy-DRL) framework to build a more data-efficient and safe learning algorithm for safety-critical CPS. We validate the proposed training strategy with Phy-DRL through extensive experiments on a simulated cart-pole system, a 2D quadrotor, a simulated and a real quadruped robot, showing remarkably improved sampling efficiency to learn more robust safe policies.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes