MLCRLGNov 27, 2015

Algorithms for Differentially Private Multi-Armed Bandits

arXiv:1511.08681v1120 citations
Originality Highly original
AI Analysis

This addresses privacy concerns in applications like clinical trials and advertising, offering a substantial performance gain over prior work.

The paper tackles the problem of designing differentially private algorithms for the stochastic Multi-Armed Bandit (MAB) problem, achieving optimal regret of O(ε^{-1} + log T), which is a significant improvement over previous poly-log regret results.

We present differentially private algorithms for the stochastic Multi-Armed Bandit (MAB) problem. This is a problem for applications such as adaptive clinical trials, experiment design, and user-targeted advertising where private information is connected to individual rewards. Our major contribution is to show that there exist $(ε, δ)$ differentially private variants of Upper Confidence Bound algorithms which have optimal regret, $O(ε^{-1} + \log T)$. This is a significant improvement over previous results, which only achieve poly-log regret $O(ε^{-2} \log^{2} T)$, because of our use of a novel interval-based mechanism. We also substantially improve the bounds of previous family of algorithms which use a continual release mechanism. Experiments clearly validate our theoretical bounds.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes