LGJan 25

Entropic Risk-Aware Monte Carlo Tree Search

Pedro P. Santos, Jacopo Silvestrin, Alberto Sardinha, Francisco S. Melo

arXiv:2601.17667v11.4

Originality Incremental advance

AI Analysis

This work addresses risk-sensitive decision-making in AI, but it is incremental as it builds on existing dynamic programming formulations.

The paper tackles the problem of solving risk-aware Markov decision processes with entropic risk measure objectives by proposing a provably correct Monte Carlo tree search algorithm, showing it converges to the optimal ERM and has polynomial regret concentration.

We propose a provably correct Monte Carlo tree search (MCTS) algorithm for solving \textit{risk-aware} Markov decision processes (MDPs) with \textit{entropic risk measure} (ERM) objectives. We provide a \textit{non-asymptotic} analysis of our proposed algorithm, showing that the algorithm: (i) is \textit{correct} in the sense that the empirical ERM obtained at the root node converges to the optimal ERM; and (ii) enjoys \textit{polynomial regret concentration}. Our algorithm successfully exploits the dynamic programming formulations for solving risk-aware MDPs with ERM objectives introduced by previous works in the context of an upper confidence bound-based tree search algorithm. Finally, we provide a set of illustrative experiments comparing our risk-aware MCTS method against relevant baselines.

View on arXiv PDF

Similar