LGDMFeb 9, 2025

Polynomial Regret Concentration of UCB for Non-Deterministic State Transitions

arXiv:2502.06900v11 citationsh-index: 4
Originality Incremental advance
AI Analysis

This work addresses the limited applicability of MCTS in real-world decision-making problems with probabilistic outcomes, such as in autonomous systems and financial decision-making, though it is incremental as it builds on prior theoretical frameworks.

The paper tackles the problem of extending Monte Carlo Tree Search to stochastic domains with non-deterministic state transitions by deriving polynomial regret concentration bounds for the Upper Confidence Bound algorithm, proving these bounds apply to such environments to ensure robust performance.

Monte Carlo Tree Search (MCTS) has proven effective in solving decision-making problems in perfect information settings. However, its application to stochastic and imperfect information domains remains limited. This paper extends the theoretical framework of MCTS to stochastic domains by addressing non-deterministic state transitions, where actions lead to probabilistic outcomes. Specifically, building on the work of Shah et al. (2020), we derive polynomial regret concentration bounds for the Upper Confidence Bound algorithm in multi-armed bandit problems with stochastic transitions, offering improved theoretical guarantees. Our primary contribution is proving that these bounds also apply to non-deterministic environments, ensuring robust performance in stochastic settings. This broadens the applicability of MCTS to real-world decision-making problems with probabilistic outcomes, such as in autonomous systems and financial decision-making.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes