LGMLJun 21, 2019

Randomized Exploration in Generalized Linear Bandits

arXiv:1906.08947v3111 citations
Originality Incremental advance
AI Analysis

This work addresses exploration challenges in bandit algorithms for researchers and practitioners, offering incremental improvements in regret bounds and novel randomization techniques.

The paper tackled the problem of exploration in generalized linear bandits by proposing two randomized algorithms, GLM-TSL and GLM-FPL, which achieved regret bounds of $ ilde{O}(d \sqrt{n \log K})$ and improved upon prior work or introduced new methods for Gaussian noise perturbations.

We study two randomized algorithms for generalized linear bandits. The first, GLM-TSL, samples a generalized linear model (GLM) from the Laplace approximation to the posterior distribution. The second, GLM-FPL, fits a GLM to a randomly perturbed history of past rewards. We analyze both algorithms and derive $\tilde{O}(d \sqrt{n \log K})$ upper bounds on their $n$-round regret, where $d$ is the number of features and $K$ is the number of arms. The former improves on prior work while the latter is the first for Gaussian noise perturbations in non-linear models. We empirically evaluate both GLM-TSL and GLM-FPL in logistic bandits, and apply GLM-FPL to neural network bandits. Our work showcases the role of randomization, beyond posterior sampling, in exploration.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes