OCSYSYAug 14, 2023

Average cost optimal control under weak ergodicity hypotheses: Relative value iterations

arXiv:1902.010488 citationsh-index: 50
AI Analysis

Provides theoretical foundations for average cost optimal control in general state/action spaces, addressing a known bottleneck in non-compact settings.

This paper establishes existence of optimal ergodic occupation measures and well-posedness of the average cost optimality equation for Markov decision processes with Polish spaces under weak ergodicity hypotheses, and proves convergence of relative value iteration algorithm.

We study Markov decision processes with Polish state and action spaces. The action space is state dependent and is not necessarily compact. We first establish the existence of an optimal ergodic occupation measure using only a near-monotone hypothesis on the running cost. Then we study the well-posedness of Bellman equation, or what is commonly known as the average cost optimality equation, under the additional hypothesis of the existence of a small set. We deviate from the usual approach which is based on the vanishing discount method and instead map the problem to an equivalent one for a controlled split chain. We employ a stochastic representation of the Poisson equation to derive the Bellman equation. Next, under suitable assumptions, we establish convergence results for the 'relative value iteration' algorithm which computes the solution of the Bellman equation recursively. In addition, we present some results concerning the stability and asymptotic optimality of the associated rolling horizon policies.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes