Differentially Private Policy Gradient
This work addresses privacy concerns for real-world RL applications, representing a notable advance over prior methods.
The paper tackles the problem of ensuring differential privacy in reinforcement learning when using personal data, by introducing a differentially private policy gradient algorithm that maintains performance without sacrificing theoretical guarantees. It demonstrates significant improvements over existing DP algorithms in online RL on various benchmarks.
Motivated by the increasing deployment of reinforcement learning in the real world, involving a large consumption of personal data, we introduce a differentially private (DP) policy gradient algorithm. We show that, in this setting, the introduction of Differential Privacy can be reduced to the computation of appropriate trust regions, thus avoiding the sacrifice of theoretical properties of the DP-less methods. Therefore, we show that it is possible to find the right trade-off between privacy noise and trust-region size to obtain a performant differentially private policy gradient algorithm. We then outline its performance empirically on various benchmarks. Our results and the complexity of the tasks addressed represent a significant improvement over existing DP algorithms in online RL.