LGOct 15, 2025

Towards Blackwell Optimality: Bellman Optimality Is All You Can Get

arXiv:2510.13476v1
Originality Incremental advance
AI Analysis

This work addresses the challenge of achieving higher-order optimality in reinforcement learning, offering incremental improvements in algorithm design for specific MDP classes.

The paper tackles the problem of identifying policies with optimality orders up to Blackwell optimality in Markov Decision Processes, showing that finite-time identification is possible only for MDPs with a unique Bellman optimal policy and providing a tractable stopping rule for such cases.

Although average gain optimality is a commonly adopted performance measure in Markov Decision Processes (MDPs), it is often too asymptotic. Further incorporating measures of immediate losses leads to the hierarchy of bias optimalities, all the way up to Blackwell optimality. In this paper, we investigate the problem of identifying policies of such optimality orders. To that end, for each order, we construct a learning algorithm with vanishing probability of error. Furthermore, we characterize the class of MDPs for which identification algorithms can stop in finite time. That class corresponds to the MDPs with a unique Bellman optimal policy, and does not depend on the optimality order considered. Lastly, we provide a tractable stopping rule that when coupled to our learning algorithm triggers in finite time whenever it is possible to do so.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes