AIJan 26, 2021

Ordinal Monte Carlo Tree Search

arXiv:2101.10670v1
Originality Incremental advance
AI Analysis

This addresses the challenge of reward design in reinforcement learning, particularly for game playing agents, by providing a method to handle ordinal state rankings without biased numerical encodings, though it is incremental as it builds on existing MCTS frameworks.

The paper tackles the problem of biased numerical reward signals in Monte Carlo Tree Search (MCTS) by proposing an ordinal treatment of rewards, which overcomes issues in domains with only ordinal rankings. The result shows dominance of the new ordinal MCTS algorithm over other variants in the General Video Game Playing framework, based on a novel bandit algorithm tested against UCB.

In many problem settings, most notably in game playing, an agent receives a possibly delayed reward for its actions. Often, those rewards are handcrafted and not naturally given. Even simple terminal-only rewards, like winning equals one and losing equals minus one, can not be seen as an unbiased statement, since these values are chosen arbitrarily, and the behavior of the learner may change with different encodings. It is hard to argue about good rewards and the performance of an agent often depends on the design of the reward signal. In particular, in domains where states by nature only have an ordinal ranking and where meaningful distance information between game state values is not available, a numerical reward signal is necessarily biased. In this paper we take a look at MCTS, a popular algorithm to solve MDPs, highlight a reoccurring problem concerning its use of rewards, and show that an ordinal treatment of the rewards overcomes this problem. Using the General Video Game Playing framework we show dominance of our newly proposed ordinal MCTS algorithm over other MCTS variants, based on a novel bandit algorithm that we also introduce and test versus UCB.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes