ML LGMay 22, 2019

An Optimal Private Stochastic-MAB Algorithm Based on an Optimal Private Stopping Rule

arXiv:1905.09383v111.855 citations

Originality Highly original

AI Analysis

This provides a provably optimal solution for privacy-preserving decision-making in bandit problems, addressing a gap in prior work.

The authors tackled the problem of designing a differentially private algorithm for the stochastic multi-arm bandit that meets known lower bounds, achieving optimal regret of O(K log(T)/ε) as proven theoretically and demonstrated empirically.

We present a provably optimal differentially private algorithm for the stochastic multi-arm bandit problem, as opposed to the private analogue of the UCB-algorithm [Mishra and Thakurta, 2015; Tossou and Dimitrakakis, 2016] which doesn't meet the recently discovered lower-bound of $Ω\left(\frac{K\log(T)}ε \right)$ [Shariff and Sheffet, 2018]. Our construction is based on a different algorithm, Successive Elimination [Even-Dar et al. 2002], that repeatedly pulls all remaining arms until an arm is found to be suboptimal and is then eliminated. In order to devise a private analogue of Successive Elimination we visit the problem of private stopping rule, that takes as input a stream of i.i.d samples from an unknown distribution and returns a multiplicative $(1 \pm α)$-approximation of the distribution's mean, and prove the optimality of our private stopping rule. We then present the private Successive Elimination algorithm which meets both the non-private lower bound [Lai and Robbins, 1985] and the above-mentioned private lower bound. We also compare empirically the performance of our algorithm with the private UCB algorithm.

View on arXiv PDF

Similar