Privacy Preserving Reinforcement Learning with One-Sided Feedback
It addresses the challenge of combining privacy and efficiency in RL for complex environments with partial feedback, which is relevant for applications like robotics and healthcare.
The paper tackles privacy-preserving reinforcement learning in multi-dimensional continuous spaces with one-sided feedback, proposing the POOL algorithm that achieves a sample complexity matching known lower bounds for non-private RL while enforcing strong privacy guarantees.
We study reinforcement learning (RL) in multi-dimensional continuous state and action spaces with one-sided feedback, where the agent receives partial observations of the state and obtains reward information for only a subset of the state-action space at each time step. This setting introduces substantial challenges in both learning efficiency and privacy preservation. To address these challenges, we propose POOL, a novel privacy-preserving RL algorithm. We conduct a comprehensive theoretical analysis of POOL, deriving a sample complexity bound that matches the known lower bounds for non-private RL. Here, E_rho denotes the privacy parameter, H is the time horizon, and alpha is the optimality-gap parameter. Our findings show that it is possible to enforce strong privacy guarantees while maintaining high learning efficiency, marking a significant step toward practical, privacy-aware RL in multi-dimensional environments with one-sided feedback.