LGNov 12, 2017

Skyline Identification in Multi-Armed Bandits

arXiv:1711.04213v2
Originality Incremental advance
AI Analysis

This work addresses a specific variant of the PAC multi-armed bandit problem, providing theoretical insights into sample complexity for researchers in bandit algorithms, but it is incremental as it builds on prior work like best arm identification.

The paper tackles the problem of identifying the skyline in multi-armed bandits, where arms are ordered and the goal is to find those with higher expected rewards than all lower-numbered arms, by proving matching upper and lower bounds for sample complexity, showing that Θ(n/ε² * min{log(1/(εδ)), log(n/δ)}) samples are necessary and sufficient to identify an ε-approximate skyline with probability 1-δ.

We introduce a variant of the classical PAC multi-armed bandit problem. There is an ordered set of $n$ arms $A[1],\dots,A[n]$, each with some stochastic reward drawn from some unknown bounded distribution. The goal is to identify the $skyline$ of the set $A$, consisting of all arms $A[i]$ such that $A[i]$ has larger expected reward than all lower-numbered arms $A[1],\dots,A[i-1]$. We define a natural notion of an $\varepsilon$-approximate skyline and prove matching upper and lower bounds for identifying an $\varepsilon$-skyline. Specifically, we show that in order to identify an $\varepsilon$-skyline from among $n$ arms with probability $1-δ$, $$ Θ\bigg(\frac{n}{\varepsilon^2} \cdot \min\bigg\{ \log\bigg(\frac{1}{\varepsilon δ}\bigg), \log\bigg(\frac{n}δ\bigg) \bigg\} \bigg) $$ samples are necessary and sufficient. When $\varepsilon \gg 1/n$, our results improve over the naive algorithm, which draws enough samples to approximate the expected reward of every arm; the algorithm of (Auer et al., AISTATS'16) for Pareto-optimal arm identification is likewise superseded. Our results show that the sample complexity of the skyline problem lies strictly in between that of best arm identification (Even-Dar et al., COLT'02) and that of approximating the expected reward of every arm.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes