OC CE LG GN MLFeb 8, 2021

Generalised correlated batched bandits via the ARC algorithm with application to dynamic pricing

arXiv:2102.04263v25.62 citations

Originality Incremental advance

AI Analysis

This work provides a more robust and efficient algorithm for dynamic pricing and similar batched bandit problems, benefiting businesses and decision-makers in optimizing revenue and resource allocation.

This paper extends the Asymptotic Randomised Control (ARC) algorithm to address batched bandit problems with observations from a generalised linear model, allowing for correlated and generally distributed observations. The authors apply this extended ARC algorithm to a dynamic pricing problem using a Bayesian hierarchical model, demonstrating its superior performance compared to alternative approaches.

The Asymptotic Randomised Control (ARC) algorithm provides a rigorous approximation to the optimal strategy for a wide class of Bayesian bandits, while retaining low computational complexity. In particular, the ARC approach provides nearly optimal choices even when the payoffs are correlated or more than the reward is observed. The algorithm is guaranteed to asymptotically optimise the expected discounted payoff, with error depending on the initial uncertainty of the bandit. In this paper, we extend the ARC framework to consider a batched bandit problem where observations arrive from a generalised linear model. In particular, we develop a large sample approximation to allow correlated and generally distributed observation. We apply this to a classic dynamic pricing problem based on a Bayesian hierarchical model and demonstrate that the ARC algorithm outperforms alternative approaches.

View on arXiv PDF

Similar