LGFeb 21, 2024

Performance Improvement Bounds for Lipschitz Configurable Markov Decision Processes

arXiv:2402.13821v11 citationsh-index: 15

Originality Incremental advance

AI Analysis

This work provides theoretical bounds for Conf-MDPs, which model environments with configurable parameters, but it is incremental as it extends existing results.

The paper tackles the problem of bounding performance improvements in configurable Markov decision processes (Conf-MDPs) under Lipschitz continuity, deriving a novel lower bound for performance improvement.

Configurable Markov Decision Processes (Conf-MDPs) have recently been introduced as an extension of the traditional Markov Decision Processes (MDPs) to model the real-world scenarios in which there is the possibility to intervene in the environment in order to configure some of its parameters. In this paper, we focus on a particular subclass of Conf-MDP that satisfies regularity conditions, namely Lipschitz continuity. We start by providing a bound on the Wasserstein distance between $γ$-discounted stationary distributions induced by changing policy and configuration. This result generalizes the already existing bounds both for Conf-MDPs and traditional MDPs. Then, we derive a novel performance improvement lower bound.

View on arXiv PDF

Similar