DSAILGFeb 11, 2025

Polynomial-Time Approximability of Constrained Reinforcement Learning

arXiv:2502.07764v11 citationsh-index: 4ICML
Originality Highly original
AI Analysis

This work addresses the problem of efficiently approximating constrained reinforcement learning policies for researchers and practitioners, providing foundational insights into computational limits and enabling broader applications in AI and control systems.

The paper tackles the computational complexity of approximating constrained Markov decision processes by designing a polynomial-time bicriteria approximation algorithm for optimal policies under various constraints, achieving optimal guarantees under the assumption P ≠ NP and resolving several open questions in constrained reinforcement learning.

We study the computational complexity of approximating general constrained Markov decision processes. Our primary contribution is the design of a polynomial time $(0,ε)$-additive bicriteria approximation algorithm for finding optimal constrained policies across a broad class of recursively computable constraints, including almost-sure, chance, expectation, and their anytime variants. Matching lower bounds imply our approximation guarantees are optimal so long as $P \neq NP$. The generality of our approach results in answers to several long-standing open complexity questions in the constrained reinforcement learning literature. Specifically, we are the first to prove polynomial-time approximability for the following settings: policies under chance constraints, deterministic policies under multiple expectation constraints, policies under non-homogeneous constraints (i.e., constraints of different types), and policies under constraints for continuous-state processes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes