The Sample-Communication Complexity Trade-off in Federated Q-Learning
This work addresses efficiency challenges in multi-agent reinforcement learning for distributed systems, providing a foundational characterization of trade-offs.
The paper tackles the trade-off between sample and communication complexities in federated Q-learning, establishing a lower bound on communication cost for speedup and proposing Fed-DVR-Q, which achieves order-optimal complexities in both metrics.
We consider the problem of federated Q-learning, where $M$ agents aim to collaboratively learn the optimal Q-function of an unknown infinite-horizon Markov decision process with finite state and action spaces. We investigate the trade-off between sample and communication complexities for the widely used class of intermittent communication algorithms. We first establish the converse result, where it is shown that a federated Q-learning algorithm that offers any speedup with respect to the number of agents in the per-agent sample complexity needs to incur a communication cost of at least an order of $\frac{1}{1-γ}$ up to logarithmic factors, where $γ$ is the discount factor. We also propose a new algorithm, called Fed-DVR-Q, which is the first federated Q-learning algorithm to simultaneously achieve order-optimal sample and communication complexities. Thus, together these results provide a complete characterization of the sample-communication complexity trade-off in federated Q-learning.