LGMay 6

Using Common Random Numbers for Simulation-based Planning with Rollouts

Sandarbh Yadav, Frederic J Maliakkal, Harshad Khadilkar, Shivaram Kalyanakrishnan

arXiv:2605.047328.7h-index: 19

AI Analysis

For practitioners of simulation-based planning, this work offers a simple, provable variance reduction technique that enhances decision quality in stochastic environments.

This paper proposes using common random numbers in simulation-based planning with rollouts to reduce variance in relative utility estimates, leading to improved task performance in synthetic experiments and practical applications like pension-disbursement planning and UCT for Ludo.

Simulation-based planning with rollouts is a widely-deployed technique for decision making in stochastic environments. The primary instrument of simulation-based planning is a sampling model, which is repeatedly called to generate trajectories and estimate the utilities of available actions. Among the actions thus explored, one with the maximum estimated utility is then executed. In this paper, we examine the effect of using common random numbers in the simulation process. We obtain a simple recipe for (provably) reducing variance in relative utility when simulations invoke a rollout policy beyond some depth. Experiments on synthetic tasks confirm that our scheme improves task performance. The broader significance of our innovation is apparent from two practical applications: (1) single-step lookahead planning in a pension-disbursement task, and (2) a deployment of the well-known UCT algorithm for the game of Ludo.

View on arXiv PDF

Similar