Reinforcement Learning with Ensemble Model Predictive Safety Certification
This addresses safety concerns for deploying RL in real-world, critical applications, though it is incremental as it builds on existing model-based and control techniques.
The paper tackles the problem of unsafe exploration in reinforcement learning for safety-critical tasks by proposing Ensemble Model Predictive Safety Certification, which combines model-based deep RL with tube-based model predictive control to correct actions, resulting in significantly fewer constraint violations compared to other methods.
Reinforcement learning algorithms need exploration to learn. However, unsupervised exploration prevents the deployment of such algorithms on safety-critical tasks and limits real-world deployment. In this paper, we propose a new algorithm called Ensemble Model Predictive Safety Certification that combines model-based deep reinforcement learning with tube-based model predictive control to correct the actions taken by a learning agent, keeping safety constraint violations at a minimum through planning. Our approach aims to reduce the amount of prior knowledge about the actual system by requiring only offline data generated by a safe controller. Our results show that we can achieve significantly fewer constraint violations than comparable reinforcement learning methods.