LGAIROSep 2, 2024

Revisiting Safe Exploration in Safe Reinforcement learning

arXiv:2409.01245v11 citationsh-index: 5
Originality Incremental advance
AI Analysis

This work addresses safety in reinforcement learning for applications where distinguishing between severe and mild safety violations is critical, representing an incremental improvement.

The paper tackled the problem of safe exploration in safe reinforcement learning by introducing a new metric, expected maximum consecutive cost steps (EMCC), to better assess safety during training, and validated it through benchmarks and a new lightweight task.

Safe reinforcement learning (SafeRL) extends standard reinforcement learning with the idea of safety, where safety is typically defined through the constraint of the expected cost return of a trajectory being below a set limit. However, this metric fails to distinguish how costs accrue, treating infrequent severe cost events as equal to frequent mild ones, which can lead to riskier behaviors and result in unsafe exploration. We introduce a new metric, expected maximum consecutive cost steps (EMCC), which addresses safety during training by assessing the severity of unsafe steps based on their consecutive occurrence. This metric is particularly effective for distinguishing between prolonged and occasional safety violations. We apply EMMC in both on- and off-policy algorithm for benchmarking their safe exploration capability. Finally, we validate our metric through a set of benchmarks and propose a new lightweight benchmark task, which allows fast evaluation for algorithm design.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes