The Price of Differential Privacy For Online Learning
This work addresses privacy concerns in online learning for applications like recommendation systems, providing significant improvements in regret bounds for bandit settings.
The paper tackles the problem of online linear optimization with differential privacy, showing that in the full-information setting, privacy can be ensured for free with O(√T) regret, while in the bandit setting, it achieves a regret of Õ(1/ε √T), improving from the previous best of Õ(1/ε T^{2/3}).
We design differentially private algorithms for the problem of online linear optimization in the full information and bandit settings with optimal $\tilde{O}(\sqrt{T})$ regret bounds. In the full-information setting, our results demonstrate that $ε$-differential privacy may be ensured for free -- in particular, the regret bounds scale as $O(\sqrt{T})+\tilde{O}\left(\frac{1}ε\right)$. For bandit linear optimization, and as a special case, for non-stochastic multi-armed bandits, the proposed algorithm achieves a regret of $\tilde{O}\left(\frac{1}ε\sqrt{T}\right)$, while the previously known best regret bound was $\tilde{O}\left(\frac{1}εT^{\frac{2}{3}}\right)$.