LG MLJun 21, 2019

Randomized Exploration in Generalized Linear Bandits

Branislav Kveton, Manzil Zaheer, Csaba Szepesvari, Lihong Li, Mohammad Ghavamzadeh, Craig Boutilier

arXiv:1906.08947v324.1112 citations

Originality Incremental advance

AI Analysis

This work addresses exploration challenges in bandit algorithms for researchers and practitioners, offering incremental improvements in regret bounds and novel randomization techniques.

The paper tackled the problem of exploration in generalized linear bandits by proposing two randomized algorithms, GLM-TSL and GLM-FPL, which achieved regret bounds of $ ilde{O}(d \sqrt{n \log K})$ and improved upon prior work or introduced new methods for Gaussian noise perturbations.

We study two randomized algorithms for generalized linear bandits. The first, GLM-TSL, samples a generalized linear model (GLM) from the Laplace approximation to the posterior distribution. The second, GLM-FPL, fits a GLM to a randomly perturbed history of past rewards. We analyze both algorithms and derive $\tilde{O}(d \sqrt{n \log K})$ upper bounds on their $n$-round regret, where $d$ is the number of features and $K$ is the number of arms. The former improves on prior work while the latter is the first for Gaussian noise perturbations in non-linear models. We empirically evaluate both GLM-TSL and GLM-FPL in logistic bandits, and apply GLM-FPL to neural network bandits. Our work showcases the role of randomization, beyond posterior sampling, in exploration.

View on arXiv PDF

Similar