MAAIDCLGDec 30, 2013

Distributed Policy Evaluation Under Multiple Behavior Strategies

arXiv:1312.7606v2109 citations
Originality Incremental advance
AI Analysis

This work addresses distributed policy evaluation for multi-agent systems, offering incremental improvements in cooperative learning efficiency and robustness.

The paper tackles the problem of distributed reinforcement learning where agents cooperate with neighbors to predict environmental responses, including off-policy scenarios. The results show that cooperation increases stability, reduces bias and variance, and enables the network to approach optimal solutions even when individual agents cannot, with the algorithm being efficient due to linear complexity in computation and memory.

We apply diffusion strategies to develop a fully-distributed cooperative reinforcement learning algorithm in which agents in a network communicate only with their immediate neighbors to improve predictions about their environment. The algorithm can also be applied to off-policy learning, meaning that the agents can predict the response to a behavior different from the actual policies they are following. The proposed distributed strategy is efficient, with linear complexity in both computation time and memory footprint. We provide a mean-square-error performance analysis and establish convergence under constant step-size updates, which endow the network with continuous learning capabilities. The results show a clear gain from cooperation: when the individual agents can estimate the solution, cooperation increases stability and reduces bias and variance of the prediction error; but, more importantly, the network is able to approach the optimal solution even when none of the individual agents can (e.g., when the individual behavior policies restrict each agent to sample a small portion of the state space).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes