LG MLJun 1, 2020

Locally Differentially Private (Contextual) Bandits Learning

Kai Zheng, Tianle Cai, Weiran Huang, Zhenguo Li, Liwei Wang

arXiv:2006.00701v420.871 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses privacy-preserving bandits learning for real-world applications, offering more practical LDP guarantees compared to weaker DP methods, though it is incremental in extending existing frameworks.

The paper tackles the problem of bandits learning under local differential privacy (LDP) by proposing black-box reduction frameworks that improve results for context-free bandits and extend to generalized linear bandits, achieving a sub-linear regret of $ ilde{O}(T^{3/4}/\varepsilon)$ and highlighting a fundamental difference between LDP and DP contextual bandits.

We study locally differentially private (LDP) bandits learning in this paper. First, we propose simple black-box reduction frameworks that can solve a large family of context-free bandits learning problems with LDP guarantee. Based on our frameworks, we can improve previous best results for private bandits learning with one-point feedback, such as private Bandits Convex Optimization, and obtain the first result for Bandits Convex Optimization (BCO) with multi-point feedback under LDP. LDP guarantee and black-box nature make our frameworks more attractive in real applications compared with previous specifically designed and relatively weaker differentially private (DP) context-free bandits algorithms. Further, we extend our $(\varepsilon, δ)$-LDP algorithm to Generalized Linear Bandits, which enjoys a sub-linear regret $\tilde{O}(T^{3/4}/\varepsilon)$ and is conjectured to be nearly optimal. Note that given the existing $Ω(T)$ lower bound for DP contextual linear bandits (Shariff & Sheffe, 2018), our result shows a fundamental difference between LDP and DP contextual bandits learning.

View on arXiv PDF Code

Similar