LGFeb 13, 2023

Provably Safe Reinforcement Learning with Step-wise Violation Constraints

arXiv:2302.06064v311.513 citationsh-index: 10

Originality Highly original

AI Analysis

This addresses safety-critical applications like robot control and autonomous driving by ensuring safety at every step, representing a novel formulation in safe RL.

The paper tackles safe reinforcement learning with strict step-wise violation constraints and no safe action assumption, proposing algorithms SUCBVI and SRF-UCRL that achieve provable bounds on violation and regret, with experimental validation.

In this paper, we investigate a novel safe reinforcement learning problem with step-wise violation constraints. Our problem differs from existing works in that we consider stricter step-wise violation constraints and do not assume the existence of safe actions, making our formulation more suitable for safety-critical applications which need to ensure safety in all decision steps and may not always possess safe actions, e.g., robot control and autonomous driving. We propose a novel algorithm SUCBVI, which guarantees $\widetilde{O}(\sqrt{ST})$ step-wise violation and $\widetilde{O}(\sqrt{H^3SAT})$ regret. Lower bounds are provided to validate the optimality in both violation and regret performance with respect to $S$ and $T$. Moreover, we further study a novel safe reward-free exploration problem with step-wise violation constraints. For this problem, we design an $(\varepsilon,δ)$-PAC algorithm SRF-UCRL, which achieves nearly state-of-the-art sample complexity $\widetilde{O}((\frac{S^2AH^2}{\varepsilon}+\frac{H^4SA}{\varepsilon^2})(\log(\frac{1}δ)+S))$, and guarantees $\widetilde{O}(\sqrt{ST})$ violation during the exploration. The experimental results demonstrate the superiority of our algorithms in safety performance, and corroborate our theoretical results.

View on arXiv PDF

Similar