MLLGJun 11, 2020

Quantile Multi-Armed Bandits: Optimal Best-Arm Identification and a Differentially Private Scheme

arXiv:2006.06792v412 citations
Originality Incremental advance
AI Analysis

This work addresses a specific problem in bandit theory for applications involving private data, offering incremental improvements with novel algorithmic designs for quantile-based objectives.

The paper tackles the best-arm identification problem in multi-armed bandits with stochastic rewards, focusing on identifying the arm with the highest quantile, and proposes a non-private algorithm that is δ-PAC and essentially optimal up to logarithmic factors, along with a differentially private version with finite sample complexity for distributions with infinite support.

We study the best-arm identification problem in multi-armed bandits with stochastic, potentially private rewards, when the goal is to identify the arm with the highest quantile at a fixed, prescribed level. First, we propose a (non-private) successive elimination algorithm for strictly optimal best-arm identification, we show that our algorithm is $δ$-PAC and we characterize its sample complexity. Further, we provide a lower bound on the expected number of pulls, showing that the proposed algorithm is essentially optimal up to logarithmic factors. Both upper and lower complexity bounds depend on a special definition of the associated suboptimality gap, designed in particular for the quantile bandit problem, as we show when the gap approaches zero, best-arm identification is impossible. Second, motivated by applications where the rewards are private, we provide a differentially private successive elimination algorithm whose sample complexity is finite even for distributions with infinite support-size, and we characterize its sample complexity. Our algorithms do not require prior knowledge of either the suboptimality gap or other statistical information related to the bandit problem at hand.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes