LGDSMay 23, 2024

Deterministic Policies for Constrained Reinforcement Learning in Polynomial Time

arXiv:2405.14183v23 citationsh-index: 4NIPS
Originality Highly original
AI Analysis

This provides a solution for researchers and practitioners needing efficient algorithms for constrained reinforcement learning with various constraint types, representing a significant theoretical advance rather than an incremental improvement.

The paper tackles the problem of efficiently computing near-optimal deterministic policies for constrained reinforcement learning, achieving a fully polynomial-time approximation scheme for time-space recursive cost criteria, which answers three open questions in the field.

We present a novel algorithm that efficiently computes near-optimal deterministic policies for constrained reinforcement learning (CRL) problems. Our approach combines three key ideas: (1) value-demand augmentation, (2) action-space approximate dynamic programming, and (3) time-space rounding. Our algorithm constitutes a fully polynomial-time approximation scheme (FPTAS) for any time-space recursive (TSR) cost criteria. A TSR criteria requires the cost of a policy to be computable recursively over both time and (state) space, which includes classical expectation, almost sure, and anytime constraints. Our work answers three open questions spanning two long-standing lines of research: polynomial-time approximability is possible for 1) anytime-constrained policies, 2) almost-sure-constrained policies, and 3) deterministic expectation-constrained policies.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes