Algorithms for Differentially Private Multi-Armed Bandits
This addresses privacy concerns in applications like clinical trials and advertising, offering a substantial performance gain over prior work.
The paper tackles the problem of designing differentially private algorithms for the stochastic Multi-Armed Bandit (MAB) problem, achieving optimal regret of O(ε^{-1} + log T), which is a significant improvement over previous poly-log regret results.
We present differentially private algorithms for the stochastic Multi-Armed Bandit (MAB) problem. This is a problem for applications such as adaptive clinical trials, experiment design, and user-targeted advertising where private information is connected to individual rewards. Our major contribution is to show that there exist $(ε, δ)$ differentially private variants of Upper Confidence Bound algorithms which have optimal regret, $O(ε^{-1} + \log T)$. This is a significant improvement over previous results, which only achieve poly-log regret $O(ε^{-2} \log^{2} T)$, because of our use of a novel interval-based mechanism. We also substantially improve the bounds of previous family of algorithms which use a continual release mechanism. Experiments clearly validate our theoretical bounds.