LGFeb 14, 2025

Provably Efficient RL under Episode-Wise Safety in Constrained MDPs with Linear Function Approximation

arXiv:2502.10138v34 citationsh-index: 9
Originality Highly original
AI Analysis

This work provides theoretical guarantees for provably efficient and safe reinforcement learning in CMDPs with linear function approximation, which is a significant advancement for researchers and practitioners working on safety-critical RL applications.

This paper addresses reinforcement learning in constrained Markov decision processes (CMDPs) with linear function approximation, where a single constraint on expected total utility must be satisfied per episode. The authors propose an algorithm that achieves an \tilde{\mathcal{O}}(\sqrt{K}) regret and guarantees zero constraint violation episode-wise.

We study the reinforcement learning (RL) problem in a constrained Markov decision process (CMDP), where an agent explores the environment to maximize the expected cumulative reward while satisfying a single constraint on the expected total utility value in every episode. While this problem is well understood in the tabular setting, theoretical results for function approximation remain scarce. This paper closes the gap by proposing an RL algorithm for linear CMDPs that achieves $\tilde{\mathcal{O}}(\sqrt{K})$ regret with an episode-wise zero-violation guarantee. Furthermore, our method is computationally efficient, scaling polynomially with problem-dependent parameters while remaining independent of the state space size. Our results significantly improve upon recent linear CMDP algorithms, which either violate the constraint or incur exponential computational costs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes