When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits
This work addresses the challenge of balancing privacy and performance in online decision-making for applications like personalized recommendations, though it is incremental as it builds on existing bandit and privacy methods.
The paper tackles the problem of multi-armed bandits with differential privacy, proving minimax and problem-dependent regret lower bounds that reveal two hardness regimes based on the privacy budget, and proposes a framework to design near-optimal private extensions of algorithms like UCB and KL-UCB, with AdaP-KLUCB matching the lower bound up to constants.
We study the problem of multi-armed bandits with $ε$-global Differential Privacy (DP). First, we prove the minimax and problem-dependent regret lower bounds for stochastic and linear bandits that quantify the hardness of bandits with $ε$-global DP. These bounds suggest the existence of two hardness regimes depending on the privacy budget $ε$. In the high-privacy regime (small $ε$), the hardness depends on a coupled effect of privacy and partial information about the reward distributions. In the low-privacy regime (large $ε$), bandits with $ε$-global DP are not harder than the bandits without privacy. For stochastic bandits, we further propose a generic framework to design a near-optimal $ε$ global DP extension of an index-based optimistic bandit algorithm. The framework consists of three ingredients: the Laplace mechanism, arm-dependent adaptive episodes, and usage of only the rewards collected in the last episode for computing private statistics. Specifically, we instantiate $ε$-global DP extensions of UCB and KL-UCB algorithms, namely AdaP-UCB and AdaP-KLUCB. AdaP-KLUCB is the first algorithm that both satisfies $ε$-global DP and yields a regret upper bound that matches the problem-dependent lower bound up to multiplicative constants.