AI LGFeb 20, 2023

Safe Deep Reinforcement Learning by Verifying Task-Level Properties

Enrico Marchesini, Luca Marzari, Alessandro Farinelli, Christopher Amato

arXiv:2302.10030v112.518 citationsh-index: 33

Originality Incremental advance

AI Analysis

This work addresses safety and sample efficiency issues in Safe DRL for robotics applications, representing an incremental improvement over existing methods.

The paper tackles the problem of unsafe state visits in Safe Deep Reinforcement Learning by introducing a violation metric based on task-level property verification, which serves as a penalty to bias policies away from unsafe states without learning an additional cost-value function. The result shows that policies trained with this penalty achieve higher performance over baselines and significantly reduce the number of visited unsafe states in benchmarks and robotic navigation tasks.

Cost functions are commonly employed in Safe Deep Reinforcement Learning (DRL). However, the cost is typically encoded as an indicator function due to the difficulty of quantifying the risk of policy decisions in the state space. Such an encoding requires the agent to visit numerous unsafe states to learn a cost-value function to drive the learning process toward safety. Hence, increasing the number of unsafe interactions and decreasing sample efficiency. In this paper, we investigate an alternative approach that uses domain knowledge to quantify the risk in the proximity of such states by defining a violation metric. This metric is computed by verifying task-level properties, shaped as input-output conditions, and it is used as a penalty to bias the policy away from unsafe states without learning an additional value function. We investigate the benefits of using the violation metric in standard Safe DRL benchmarks and robotic mapless navigation tasks. The navigation experiments bridge the gap between Safe DRL and robotics, introducing a framework that allows rapid testing on real robots. Our experiments show that policies trained with the violation penalty achieve higher performance over Safe DRL baselines and significantly reduce the number of visited unsafe states.

View on arXiv PDF

Similar