Differentially Private Deep Model-Based Reinforcement Learning
This work addresses privacy concerns in reinforcement learning for control tasks, offering a novel approach that extends privacy guarantees to more complex scenarios, though it is incremental in advancing existing privacy methods.
The paper tackles private deep offline reinforcement learning by introducing PriMORL, a model-based RL algorithm with differential privacy guarantees, enabling training on offline continuous control tasks with deep function approximations, whereas prior methods were limited to simpler settings.
We address private deep offline reinforcement learning (RL), where the goal is to train a policy on standard control tasks that is differentially private (DP) with respect to individual trajectories in the dataset. To achieve this, we introduce PriMORL, a model-based RL algorithm with formal differential privacy guarantees. PriMORL first learns an ensemble of trajectory-level DP models of the environment from offline data. It then optimizes a policy on the penalized private model, without any further interaction with the system or access to the dataset. In addition to offering strong theoretical foundations, we demonstrate empirically that PriMORL enables the training of private RL agents on offline continuous control tasks with deep function approximations, whereas current methods are limited to simpler tabular and linear Markov Decision Processes (MDPs). We furthermore outline the trade-offs involved in achieving privacy in this setting.