MLMay 24, 2017

Boundary Crossing Probabilities for General Exponential Families

arXiv:1705.08814v16.014 citationsh-index: 17

Originality Incremental advance

AI Analysis

This work solves a foundational problem in multi-armed bandit theory by generalizing key results to higher dimensions, which is incremental but crucial for analyzing state-of-the-art algorithms.

The paper tackles the problem of bounding boundary crossing probabilities for exponential families of arbitrary finite dimension, extending previous results limited to dimension one, and provides a concentration inequality that enables analyzing the regret of KLUCB and KLUCBp strategies in multi-armed bandits.

We consider parametric exponential families of dimension $K$ on the real line. We study a variant of \textit{boundary crossing probabilities} coming from the multi-armed bandit literature, in the case when the real-valued distributions form an exponential family of dimension $K$. Formally, our result is a concentration inequality that bounds the probability that $\mathcal{B}^ψ(\hat θ_n,θ^\star)\geq f(t/n)/n$, where $θ^\star$ is the parameter of an unknown target distribution, $\hat θ_n$ is the empirical parameter estimate built from $n$ observations, $ψ$ is the log-partition function of the exponential family and $\mathcal{B}^ψ$ is the corresponding Bregman divergence. From the perspective of stochastic multi-armed bandits, we pay special attention to the case when the boundary function $f$ is logarithmic, as it is enables to analyze the regret of the state-of-the-art \KLUCB\ and \KLUCBp\ strategies, whose analysis was left open in such generality. Indeed, previous results only hold for the case when $K=1$, while we provide results for arbitrary finite dimension $K$, thus considerably extending the existing results. Perhaps surprisingly, we highlight that the proof techniques to achieve these strong results already existed three decades ago in the work of T.L. Lai, and were apparently forgotten in the bandit community. We provide a modern rewriting of these beautiful techniques that we believe are useful beyond the application to stochastic multi-armed bandits.

View on arXiv PDF

Similar