LGOct 15, 2020

Local Differential Privacy for Regret Minimization in Reinforcement Learning

arXiv:2010.07778v342 citations
Originality Incremental advance
AI Analysis

This addresses privacy concerns for personalized services using RL, but it is incremental as it builds on existing LDP frameworks.

The paper tackles the problem of protecting sensitive user data in reinforcement learning by applying local differential privacy to finite-horizon Markov Decision Processes, establishing a lower bound showing privacy multiplies regret and presenting an algorithm achieving sqrt(K)/ε regret matching this bound.

Reinforcement learning algorithms are widely used in domains where it is desirable to provide a personalized service. In these domains it is common that user data contains sensitive information that needs to be protected from third parties. Motivated by this, we study privacy in the context of finite-horizon Markov Decision Processes (MDPs) by requiring information to be obfuscated on the user side. We formulate this notion of privacy for RL by leveraging the local differential privacy (LDP) framework. We establish a lower bound for regret minimization in finite-horizon MDPs with LDP guarantees which shows that guaranteeing privacy has a multiplicative effect on the regret. This result shows that while LDP is an appealing notion of privacy, it makes the learning problem significantly more complex. Finally, we present an optimistic algorithm that simultaneously satisfies $\varepsilon$-LDP requirements, and achieves $\sqrt{K}/\varepsilon$ regret in any finite-horizon MDP after $K$ episodes, matching the lower bound dependency on the number of episodes $K$.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes