LG AI MLJun 19, 2020

FISAR: Forward Invariant Safe Reinforcement Learning with a Deep Neural Network-Based Optimize

Chuangchuang Sun, Dong-Ki Kim, Jonathan P. How

arXiv:2006.11419v45.88 citations

Originality Incremental advance

AI Analysis

This work addresses safety constraints in reinforcement learning for applications like robotics, though it appears incremental as it builds on existing constrained optimization methods with a novel optimizer design.

The paper tackles constrained reinforcement learning in safety-critical environments by proposing a deep neural network-based optimizer that ensures forward invariance of safety constraints, resulting in monotonic decrease of constraint violations and maximization of cumulative rewards, validated through numerical optimization and obstacle-avoidance navigation tasks.

This paper investigates reinforcement learning with constraints, which are indispensable in safety-critical environments. To drive the constraint violation monotonically decrease, we take the constraints as Lyapunov functions and impose new linear constraints on the policy parameters' updating dynamics. As a result, the original safety set can be forward-invariant. However, because the new guaranteed-feasible constraints are imposed on the updating dynamics instead of the original policy parameters, classic optimization algorithms are no longer applicable. To address this, we propose to learn a generic deep neural network (DNN)-based optimizer to optimize the objective while satisfying the linear constraints. The constraint-satisfaction is achieved via projection onto a polytope formulated by multiple linear inequality constraints, which can be solved analytically with our newly designed metric. To the best of our knowledge, this is the \textit{first} DNN-based optimizer for constrained optimization with the forward invariance guarantee. We show that our optimizer trains a policy to decrease the constraint violation and maximize the cumulative reward monotonically. Results on numerical constrained optimization and obstacle-avoidance navigation validate the theoretical findings.

View on arXiv PDF

Similar