Adaptive Monte Carlo via Bandit Allocation
This work addresses a specific challenge in Monte Carlo estimation for researchers and practitioners, offering incremental improvements over prior methods.
The paper tackles the problem of sequentially selecting unbiased Monte Carlo estimators to minimize mean-squared-error (MSE) by framing it as a stochastic multi-armed bandit problem, achieving an MSE that approaches the best estimator in retrospect, and extends this to scenarios with varying costs for stronger guarantees and practical benefits.
We consider the problem of sequentially choosing between a set of unbiased Monte Carlo estimators to minimize the mean-squared-error (MSE) of a final combined estimate. By reducing this task to a stochastic multi-armed bandit problem, we show that well developed allocation strategies can be used to achieve an MSE that approaches that of the best estimator chosen in retrospect. We then extend these developments to a scenario where alternative estimators have different, possibly stochastic costs. The outcome is a new set of adaptive Monte Carlo strategies that provide stronger guarantees than previous approaches while offering practical advantages.