LG AI NEJan 9, 2025

On-line Policy Improvement using Monte-Carlo Search

arXiv:2501.05407v138.7279 citationsh-index: 53NIPS

Originality Incremental advance

AI Analysis

This provides a method for enhancing adaptive control systems, such as in backgammon, with potential broader applications in simulation-based domains, though it appears incremental as it builds on existing Monte-Carlo and policy improvement techniques.

The paper tackles the problem of real-time policy improvement for adaptive controllers by using a Monte-Carlo simulation algorithm to measure long-term expected rewards and select optimal actions, resulting in error rate reductions of up to a factor of 5 or more in backgammon applications.

We present a Monte-Carlo simulation algorithm for real-time policy improvement of an adaptive controller. In the Monte-Carlo simulation, the long-term expected reward of each possible action is statistically measured, using the initial policy to make decisions in each step of the simulation. The action maximizing the measured expected reward is then taken, resulting in an improved policy. Our algorithm is easily parallelizable and has been implemented on the IBM SP1 and SP2 parallel-RISC supercomputers. We have obtained promising initial results in applying this algorithm to the domain of backgammon. Results are reported for a wide variety of initial policies, ranging from a random policy to TD-Gammon, an extremely strong multi-layer neural network. In each case, the Monte-Carlo algorithm gives a substantial reduction, by as much as a factor of 5 or more, in the error rate of the base players. The algorithm is also potentially useful in many other adaptive control applications in which it is possible to simulate the environment.

View on arXiv PDF

Similar