LGOCOct 1, 2023

A primal-dual perspective for distributed TD-learning

arXiv:2310.00638v32 citationsh-index: 4
Originality Incremental advance
AI Analysis

This work addresses distributed reinforcement learning for networked agents, offering a novel optimization perspective that relaxes communication constraints, though it appears incremental in method.

The paper tackles the problem of distributed temporal difference learning for multi-agent Markov decision processes by proposing a primal-dual ODE-based approach, achieving exponential convergence without requiring doubly stochastic communication networks.

The goal of this paper is to investigate distributed temporal difference (TD) learning for a networked multi-agent Markov decision process. The proposed approach is based on distributed optimization algorithms, which can be interpreted as primal-dual Ordinary differential equation (ODE) dynamics subject to null-space constraints. Based on the exponential convergence behavior of the primal-dual ODE dynamics subject to null-space constraints, we examine the behavior of the final iterate in various distributed TD-learning scenarios, considering both constant and diminishing step-sizes and incorporating both i.i.d. and Markovian observation models. Unlike existing methods, the proposed algorithm does not require the assumption that the underlying communication network structure is characterized by a doubly stochastic matrix.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes