On Global Convergence Rates for Federated Policy Gradient under Heterogeneous Environment
It addresses convergence issues in federated reinforcement learning for heterogeneous settings, which is an incremental improvement with theoretical and empirical validation.
The paper tackles the challenge of ensuring convergence for policy gradient methods in federated reinforcement learning under heterogeneous environments, proving that FedPG achieves linear speed-up with the number of agents and introducing b-RS-FedPG, which outperforms federated Q-learning empirically.
Ensuring convergence of policy gradient methods in federated reinforcement learning (FRL) under environment heterogeneity remains a major challenge. In this work, we first establish that heterogeneity, perhaps counter-intuitively, can necessitate optimal policies to be non-deterministic or even time-varying, even in tabular environments. Subsequently, we prove global convergence results for federated policy gradient (FedPG) algorithms employing local updates, under a Łojasiewicz condition that holds only for each individual agent, in both entropy-regularized and non-regularized scenarios. Crucially, our theoretical analysis shows that FedPG attains linear speed-up with respect to the number of agents, a property central to efficient federated learning. Leveraging insights from our theoretical findings, we introduce b-RS-FedPG, a novel policy gradient method that employs a carefully constructed softmax-inspired parameterization coupled with an appropriate regularization scheme. We further demonstrate explicit convergence rates for b-RS-FedPG toward near-optimal stationary policies. Finally, we demonstrate that empirically both FedPG and b-RS-FedPG consistently outperform federated Q-learning on heterogeneous settings.