LGDCMar 13, 2024

One-Shot Averaging for Distributed TD($λ$) Under Markov Sampling

arXiv:2403.08896v26 citationsh-index: 41IEEE Control Systems Letters
AI Analysis

This work addresses the challenge of efficient distributed reinforcement learning for multi-agent systems, offering a communication-efficient solution that is incremental over existing approaches.

The paper tackles the problem of distributed policy evaluation in reinforcement learning, showing that N agents can achieve N times faster evaluation using a one-shot averaging procedure with TD(λ) under Markov sampling, significantly reducing communication compared to prior methods.

We consider a distributed setup for reinforcement learning, where each agent has a copy of the same Markov Decision Process but transitions are sampled from the corresponding Markov chain independently by each agent. We show that in this setting, we can achieve a linear speedup for TD($λ$), a family of popular methods for policy evaluation, in the sense that $N$ agents can evaluate a policy $N$ times faster provided the target accuracy is small enough. Notably, this speedup is achieved by ``one shot averaging,'' a procedure where the agents run TD($λ$) with Markov sampling independently and only average their results after the final step. This significantly reduces the amount of communication required to achieve a linear speedup relative to previous work.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes