LGSep 26, 2015

Algorithms for Linear Bandits on Polyhedral Sets

arXiv:1509.07927v11 citations
Originality Incremental advance
AI Analysis

This addresses efficient algorithms for bandit optimization in constrained settings, with incremental improvements in regret bounds.

The paper tackles the stochastic linear bandit problem with arms constrained to a polyhedron, establishing a lower bound of Ω(N log T) and providing a nearly optimal algorithm with O(N log^{1+ε}(T)) regret, resolving an open question from prior work.

We study stochastic linear optimization problem with bandit feedback. The set of arms take values in an $N$-dimensional space and belong to a bounded polyhedron described by finitely many linear inequalities. We provide a lower bound for the expected regret that scales as $Ω(N\log T)$. We then provide a nearly optimal algorithm and show that its expected regret scales as $O(N\log^{1+ε}(T))$ for an arbitrary small $ε>0$. The algorithm alternates between exploration and exploitation intervals sequentially where deterministic set of arms are played in the exploration intervals and greedily selected arm is played in the exploitation intervals. We also develop an algorithm that achieves the optimal regret when sub-Gaussianity parameter of the noise term is known. Our key insight is that for a polyhedron the optimal arm is robust to small perturbations in the reward function. Consequently, a greedily selected arm is guaranteed to be optimal when the estimation error falls below some suitable threshold. Our solution resolves a question posed by Rusmevichientong and Tsitsiklis (2011) that left open the possibility of efficient algorithms with asymptotic logarithmic regret bounds. We also show that the regret upper bounds hold with probability $1$. Our numerical investigations show that while theoretical results are asymptotic the performance of our algorithms compares favorably to state-of-the-art algorithms in finite time as well.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes