Safety Guarantees in Zero-Shot Reinforcement Learning for Cascade Dynamical Systems
For control and robotics practitioners, this work addresses the challenge of ensuring safety in complex systems without retraining, though it is incremental as it builds on existing reduced-order modeling and tracking control ideas.
This paper introduces a method for zero-shot safe reinforcement learning in cascade dynamical systems by training on a reduced-order model and combining with a low-level tracking controller, providing a theoretical bound on safety probability. Experiments on quadrotor navigation show safety guarantees depend on the low-level controller's tracking capabilities.
This paper considers the problem of zero-shot safety guarantees for cascade dynamical systems. These are systems where a subset of the states (the inner states) affects the dynamics of the remaining states (the outer states) but not vice-versa. We define safety as remaining on a set deemed safe for all times with high probability. We propose to train a safe RL policy on a reduced-order model, which ignores the dynamics of the inner states, but it treats it as an action that influences the outer state. Thus, reducing the complexity of the training. When deployed in the full system the trained policy is combined with a low-level controller whose task is to track the reference provided by the RL policy. Our main theoretical contribution is a bound on the safe probability in the full-order system. In particular, we establish the interplay between the probability of remaining safe after the zero-shot deployment and the quality of the tracking of the inner states. We validate our theoretical findings on a quadrotor navigation task, demonstrating that the preservation of the safety guarantees is tied to the bandwidth and tracking capabilities of the low-level controller.