MLLGOCMay 14, 2024

Thompson Sampling for Infinite-Horizon Discounted Decision Processes

arXiv:2405.08253v21 citationsh-index: 6
AI Analysis

This work addresses the challenge of defining meaningful learning metrics for decision processes with non-trivial state evolution, which is incremental but useful for broader applications of sampling algorithms.

The paper tackles the problem of evaluating policies in Markov decision processes with unknown parameters, showing that standard regret metrics can grow super-linearly and fail to capture learning in realistic settings. It introduces a new metric called expected residual regret, which forgets past actions and measures regret moving forward, and proves that Thompson sampling achieves exponential convergence to zero for this metric under certain conditions.

We model a Markov decision process, parametrized by an unknown parameter, and study the asymptotic behavior of a sampling-based algorithm, called Thompson sampling. The standard definition of regret is not always suitable to evaluate a policy, especially when the underlying chain structure is general. We show that the standard (expected) regret can grow (super-)linearly and fails to capture the notion of learning in realistic settings with non-trivial state evolution. By decomposing the standard (expected) regret, we develop a new metric, called the expected residual regret, which forgets the immutable consequences of past actions. Instead, it measures regret against the optimal reward moving forward from the current period. We show that the expected residual regret of the Thompson sampling algorithm is upper bounded by a term which converges exponentially fast to 0. We present conditions under which the posterior sampling error of Thompson sampling converges to 0 almost surely. We then introduce the probabilistic version of the expected residual regret and present conditions under which it converges to 0 almost surely. Thus, we provide a viable concept of learning for sampling algorithms which will serve useful in broader settings than had been considered previously.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes