DEFENDER: DTW-Based Episode Filtering Using Demonstrations for Enhancing RL Safety
This addresses safety concerns for deploying RL agents in real-world applications, but it is incremental as it builds on existing RL algorithms with a filtering approach.
The paper tackles the problem of improving safety in reinforcement learning agents by proposing a task-agnostic method that uses safe and unsafe demonstrations to filter trajectories, resulting in a significant reduction in crash rates while maintaining or improving performance on Mujoco benchmark tasks.
Deploying reinforcement learning agents in the real world can be challenging due to the risks associated with learning through trial and error. We propose a task-agnostic method that leverages small sets of safe and unsafe demonstrations to improve the safety of RL agents during learning. The method compares the current trajectory of the agent with both sets of demonstrations at every step, and filters the trajectory if it resembles the unsafe demonstrations. We perform ablation studies on different filtering strategies and investigate the impact of the number of demonstrations on performance. Our method is compatible with any stand-alone RL algorithm and can be applied to any task. We evaluate our method on three tasks from OpenAI Gym's Mujoco benchmark and two state-of-the-art RL algorithms. The results demonstrate that our method significantly reduces the crash rate of the agent while converging to, and in most cases even improving, the performance of the stand-alone agent.