LGAICRMAMLApr 11, 2024

Differentially Private Reinforcement Learning with Self-Play

arXiv:2404.07559v1h-index: 8NIPS
Originality Highly original
AI Analysis

This work addresses privacy protection in multi-agent RL for applications with sensitive data, representing a foundational step in this area.

The paper tackles the problem of multi-agent reinforcement learning with differential privacy constraints by extending privacy definitions to two-player zero-sum Markov Games and designing an efficient algorithm with provable regret bounds that generalize best-known results in single-agent RL and reduce to non-private multi-agent RL cases.

We study the problem of multi-agent reinforcement learning (multi-agent RL) with differential privacy (DP) constraints. This is well-motivated by various real-world applications involving sensitive data, where it is critical to protect users' private information. We first extend the definitions of Joint DP (JDP) and Local DP (LDP) to two-player zero-sum episodic Markov Games, where both definitions ensure trajectory-wise privacy protection. Then we design a provably efficient algorithm based on optimistic Nash value iteration and privatization of Bernstein-type bonuses. The algorithm is able to satisfy JDP and LDP requirements when instantiated with appropriate privacy mechanisms. Furthermore, for both notions of DP, our regret bound generalizes the best known result under the single-agent RL case, while our regret could also reduce to the best known result for multi-agent RL without privacy constraints. To the best of our knowledge, these are the first line of results towards understanding trajectory-wise privacy protection in multi-agent RL.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes