OCLGSYFeb 13, 2024

Model approximation in MDPs with unbounded per-step cost

arXiv:2402.08813v15 citationsh-index: 22IEEE Transactions on Automatic Control
Originality Synthesis-oriented
AI Analysis

This work addresses a fundamental issue in control theory for researchers and practitioners dealing with model approximation in MDPs, but it is incremental as it builds on existing bounding techniques.

The paper tackles the problem of evaluating how well an optimal policy from an approximate model performs in the original Markov decision process with unbounded per-step cost, by bounding the difference in value functions and extending results with affine cost transformations and explicit dependencies on model distances.

We consider the problem of designing a control policy for an infinite-horizon discounted cost Markov decision process $\mathcal{M}$ when we only have access to an approximate model $\hat{\mathcal{M}}$. How well does an optimal policy $\hatπ^{\star}$ of the approximate model perform when used in the original model $\mathcal{M}$? We answer this question by bounding a weighted norm of the difference between the value function of $\hatπ^\star $ when used in $\mathcal{M}$ and the optimal value function of $\mathcal{M}$. We then extend our results and obtain potentially tighter upper bounds by considering affine transformations of the per-step cost. We further provide upper bounds that explicitly depend on the weighted distance between cost functions and weighted distance between transition kernels of the original and approximate models. We present examples to illustrate our results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes