LGITSYOCMLNov 3, 2019

Finite-Sample Analysis of Decentralized Temporal-Difference Learning with Linear Function Approximation

arXiv:1911.00934v221 citations
Originality Highly original
AI Analysis

This addresses the limited theoretical understanding of decentralized TD learning for applications like networked robotics, offering foundational error bounds under practical assumptions.

The paper tackles the policy evaluation problem in decentralized multi-agent reinforcement learning using TD(0) with linear function approximation, providing finite-sample analysis under i.i.d. and Markovian samples and proving linear convergence to a small optimum neighborhood.

Motivated by the emerging use of multi-agent reinforcement learning (MARL) in engineering applications such as networked robotics, swarming drones, and sensor networks, we investigate the policy evaluation problem in a fully decentralized setting, using temporal-difference (TD) learning with linear function approximation to handle large state spaces in practice. The goal of a group of agents is to collaboratively learn the value function of a given policy from locally private rewards observed in a shared environment, through exchanging local estimates with neighbors. Despite their simplicity and widespread use, our theoretical understanding of such decentralized TD learning algorithms remains limited. Existing results were obtained based on i.i.d. data samples, or by imposing an `additional' projection step to control the `gradient' bias incurred by the Markovian observations. In this paper, we provide a finite-sample analysis of the fully decentralized TD(0) learning under both i.i.d. as well as Markovian samples, and prove that all local estimates converge linearly to a small neighborhood of the optimum. The resultant error bounds are the first of its type---in the sense that they hold under the most practical assumptions ---which is made possible by means of a novel multi-step Lyapunov analysis.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes