Initial Distribution Sensitivity of Constrained Markov Decision Processes
This work addresses the complexity of solving CMDPs for practitioners when initial distributions vary, but it is incremental as it builds on existing analysis methods.
The paper tackles the problem of how the optimal value of Constrained Markov Decision Processes (CMDPs) changes with different initial state distributions, deriving bounds on these variations using duality and perturbation analysis, and shows how these bounds can analyze policy regret due to unknown distribution changes.
Constrained Markov Decision Processes (CMDPs) are notably more complex to solve than standard MDPs due to the absence of universally optimal policies across all initial state distributions. This necessitates re-solving the CMDP whenever the initial distribution changes. In this work, we analyze how the optimal value of CMDPs varies with different initial distributions, deriving bounds on these variations using duality analysis of CMDPs and perturbation analysis in linear programming. Moreover, we show how such bounds can be used to analyze the regret of a given policy due to unknown variations of the initial distribution.