LGOCMay 22, 2022

Policy-based Primal-Dual Methods for Concave CMDP with Variance Reduction

Berkeley
arXiv:2205.10715v45 citationsh-index: 38
Originality Incremental advance
AI Analysis

This addresses optimization challenges in reinforcement learning for constrained decision-making problems, representing an incremental improvement with specific algorithmic enhancements.

The paper tackles the problem of solving Concave Constrained Markov Decision Processes (Concave CMDPs) with non-additive concave objectives and constraints, proposing the VR-PDPG algorithm that achieves an O(T^{-1/3}) convergence rate in exact settings and an O(ε^{-4}) sample complexity for ε-global optimality in sample-based settings, with zero constraint violation possible via a pessimistic term.

We study Concave Constrained Markov Decision Processes (Concave CMDPs) where both the objective and constraints are defined as concave functions of the state-action occupancy measure. We propose the Variance-Reduced Primal-Dual Policy Gradient Algorithm (VR-PDPG), which updates the primal variable via policy gradient ascent and the dual variable via projected sub-gradient descent. Despite the challenges posed by the loss of additivity structure and the nonconcave nature of the problem, we establish the global convergence of VR-PDPG by exploiting a form of hidden concavity. In the exact setting, we prove an $O(T^{-1/3})$ convergence rate for both the average optimality gap and constraint violation, which further improves to $O(T^{-1/2})$ under strong concavity of the objective in the occupancy measure. In the sample-based setting, we demonstrate that VR-PDPG achieves an $\widetilde{O}(ε^{-4})$ sample complexity for $ε$-global optimality. Moreover, by incorporating a diminishing pessimistic term into the constraint, we show that VR-PDPG can attain a zero constraint violation without compromising the convergence rate of the optimality gap. Finally, we validate the effectiveness of our methods through numerical experiments.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes