AIOct 24, 2025

Investigating Scale Independent UCT Exploration Factor Strategies

Robin Schmöcker, Christoph Schnell, Alexander Dockhorn

arXiv:2510.21275v111.14 citationsh-index: 5IEEE Trans Game

Originality Incremental advance

AI Analysis

This addresses a practical issue for researchers and practitioners using UCT in games with dense rewards, but it is incremental as it builds on prior λ-strategies.

The paper tackles the problem that the UCT algorithm's performance depends on the reward scale of games, by evaluating adaptive strategies for choosing the exploration constant λ to make it scale-independent. It recommends a new strategy using λ = 2·σ, which outperforms existing methods across various tasks, achieving better single-parameter and peak performances.

The Upper Confidence Bounds For Trees (UCT) algorithm is not agnostic to the reward scale of the game it is applied to. For zero-sum games with the sparse rewards of $\{-1,0,1\}$ at the end of the game, this is not a problem, but many games often feature dense rewards with hand-picked reward scales, causing a node's Q-value to span different magnitudes across different games. In this paper, we evaluate various strategies for adaptively choosing the UCT exploration constant $λ$, called $λ$-strategies, that are agnostic to the game's reward scale. These $λ$-strategies include those proposed in the literature as well as five new strategies. Given our experimental results, we recommend using one of our newly suggested $λ$-strategies, which is to choose $λ$ as $2 \cdot σ$ where $σ$ is the empirical standard deviation of all state-action pairs' Q-values of the search tree. This method outperforms existing $λ$-strategies across a wide range of tasks both in terms of a single parameter value and the peak performances obtained by optimizing all available parameters.

View on arXiv PDF

Similar