LGMLJun 12, 2023

Cancellation-Free Regret Bounds for Lagrangian Approaches in Constrained Markov Decision Processes

arXiv:2306.07001v29 citationsh-index: 29
Originality Highly original
AI Analysis

This work addresses safety concerns in reinforcement learning by eliminating the need for error cancellation, which is crucial for practical applications where strict constraint satisfaction is required.

The paper tackles the problem of safe reinforcement learning in constrained Markov decision processes (CMDPs) by proposing a model-based dual algorithm, OptAug-CMDP, which achieves a regret of $ ilde{O}(\sqrt{K})$ for both objective and constraint violation over $K$ episodes without relying on error cancellation.

Constrained Markov Decision Processes (CMDPs) are one of the common ways to model safe reinforcement learning problems, where constraint functions model the safety objectives. Lagrangian-based dual or primal-dual algorithms provide efficient methods for learning in CMDPs. For these algorithms, the currently known regret bounds in the finite-horizon setting allow for a "cancellation of errors"; one can compensate for a constraint violation in one episode with a strict constraint satisfaction in another. However, we do not consider such a behavior safe in practical applications. In this paper, we overcome this weakness by proposing a novel model-based dual algorithm OptAug-CMDP for tabular finite-horizon CMDPs. Our algorithm is motivated by the augmented Lagrangian method and can be performed efficiently. We show that during $K$ episodes of exploring the CMDP, our algorithm obtains a regret of $\tilde{O}(\sqrt{K})$ for both the objective and the constraint violation. Unlike existing Lagrangian approaches, our algorithm achieves this regret without the need for the cancellation of errors.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes