MLLGFeb 5, 2025

Gap-Dependent Bounds for Federated $Q$-learning

arXiv:2502.02859v25 citationsh-index: 4ICML
Originality Highly original
AI Analysis

This work addresses communication efficiency in federated reinforcement learning for multi-agent systems, offering incremental improvements over worst-case methods.

The paper tackles the problem of federated Q-learning in tabular episodic MDPs by introducing a gap-dependent analysis, achieving a log T-type regret bound and refined communication cost that removes dependence on MSA in the log term, with results showing improved global switching cost for M=1.

We present the first gap-dependent analysis of regret and communication cost for on-policy federated $Q$-Learning in tabular episodic finite-horizon Markov decision processes (MDPs). Existing FRL methods focus on worst-case scenarios, leading to $\sqrt{T}$-type regret bounds and communication cost bounds with a $\log T$ term scaling with the number of agents $M$, states $S$, and actions $A$, where $T$ is the average total number of steps per agent. In contrast, our novel framework leverages the benign structures of MDPs, such as a strictly positive suboptimality gap, to achieve a $\log T$-type regret bound and a refined communication cost bound that disentangles exploration and exploitation. Our gap-dependent regret bound reveals a distinct multi-agent speedup pattern, and our gap-dependent communication cost bound removes the dependence on $MSA$ from the $\log T$ term. Notably, our gap-dependent communication cost bound also yields a better global switching cost when $M=1$, removing $SA$ from the $\log T$ term.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes