Local Differential Privacy for Sequential Decision Making in a Changing Environment
This work addresses privacy concerns in dynamic decision-making systems, such as online recommendations or adaptive learning, but is incremental as it builds on existing bandit and privacy frameworks.
The paper tackles the problem of preserving privacy while maintaining high utility in sequential decision-making under abruptly changing environments, introducing a variant of multi-armed bandits called non-stationary stochastic corrupt bandits. It proposes the SW-KLUCB-CF algorithm with a near-optimal regret upper bound in terms of time steps and number of changes, and presents a provably optimal mechanism for local differential privacy.
We study the problem of preserving privacy while still providing high utility in sequential decision making scenarios in a changing environment. We consider abruptly changing environment: the environment remains constant during periods and it changes at unknown time instants. To formulate this problem, we propose a variant of multi-armed bandits called non-stationary stochastic corrupt bandits. We construct an algorithm called SW-KLUCB-CF and prove an upper bound on its utility using the performance measure of regret. The proven regret upper bound for SW-KLUCB-CF is near-optimal in the number of time steps and matches the best known bound for analogous problems in terms of the number of time steps and the number of changes. Moreover, we present a provably optimal mechanism which can guarantee the desired level of local differential privacy while providing high utility.