FedRLHF: A Convergence-Guaranteed Federated Framework for Privacy-Preserving and Personalized RLHF
This addresses privacy concerns and demand for personalized experiences in RLHF for users and developers, but it is incremental as it adapts existing federated learning to RLHF.
The paper tackles the challenge of privacy and personalization in Reinforcement Learning with Human Feedback (RLHF) by introducing FedRLHF, a federated framework that decentralizes RLHF to avoid sharing raw data or feedback, and it achieves performance comparable to centralized RLHF while enhancing personalization, as shown in evaluations on MovieLens and IMDb datasets.
In the era of increasing privacy concerns and demand for personalized experiences, traditional Reinforcement Learning with Human Feedback (RLHF) frameworks face significant challenges due to their reliance on centralized data. We introduce Federated Reinforcement Learning with Human Feedback (FedRLHF), a novel framework that decentralizes the RLHF process. FedRLHF enables collaborative policy learning across multiple clients without necessitating the sharing of raw data or human feedback, thereby ensuring robust privacy preservation. Leveraging federated reinforcement learning, each client integrates human feedback locally into their reward functions and updates their policies through personalized RLHF processes. We establish rigorous theoretical foundations for FedRLHF, providing convergence guarantees, and deriving sample complexity bounds that scale efficiently with the number of clients. Empirical evaluations on the MovieLens and IMDb datasets demonstrate that FedRLHF not only preserves user privacy but also achieves performance on par with centralized RLHF, while enhancing personalization across diverse client environments.