LGOct 8, 2025

Parameter-Free Federated TD Learning with Markov Noise in Heterogeneous Environments

arXiv:2510.07436v12 citationsh-index: 9
Originality Incremental advance
AI Analysis

This work addresses a gap in federated reinforcement learning by enabling parameter-free algorithms for heterogeneous environments, though it is incremental as it builds on existing convergence rate frameworks.

The paper tackles the problem of parameter-free federated temporal difference learning with Markovian data in heterogeneous environments, proposing a two-timescale method with Polyak-Ruppert averaging that provably achieves the optimal convergence rate of O~(1/(NT)).

Federated learning (FL) can dramatically speed up reinforcement learning by distributing exploration and training across multiple agents. It can guarantee an optimal convergence rate that scales linearly in the number of agents, i.e., a rate of $\tilde{O}(1/(NT)),$ where $T$ is the iteration index and $N$ is the number of agents. However, when the training samples arise from a Markov chain, existing results on TD learning achieving this rate require the algorithm to depend on unknown problem parameters. We close this gap by proposing a two-timescale Federated Temporal Difference (FTD) learning with Polyak-Ruppert averaging. Our method provably attains the optimal $\tilde{O}(1/NT)$ rate in both average-reward and discounted settings--offering a parameter-free FTD approach for Markovian data. Although our results are novel even in the single-agent setting, they apply to the more realistic and challenging scenario of FL with heterogeneous environments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes