Accelerating Approximate Thompson Sampling with Underdamped Langevin Monte Carlo
This work addresses computational bottlenecks in high-dimensional bandit problems, representing an incremental improvement over existing methods.
The paper tackles the scalability issues of approximate Thompson sampling in high-dimensional problems by proposing a strategy using underdamped Langevin Monte Carlo, which improves sample complexity for logarithmic regrets from O(d) to O(√d).
Approximate Thompson sampling with Langevin Monte Carlo broadens its reach from Gaussian posterior sampling to encompass more general smooth posteriors. However, it still encounters scalability issues in high-dimensional problems when demanding high accuracy. To address this, we propose an approximate Thompson sampling strategy, utilizing underdamped Langevin Monte Carlo, where the latter is the go-to workhorse for simulations of high-dimensional posteriors. Based on the standard smoothness and log-concavity conditions, we study the accelerated posterior concentration and sampling using a specific potential function. This design improves the sample complexity for realizing logarithmic regrets from $\mathcal{\tilde O}(d)$ to $\mathcal{\tilde O}(\sqrt{d})$. The scalability and robustness of our algorithm are also empirically validated through synthetic experiments in high-dimensional bandit problems.