LGSYOCApr 20, 2022

Exact Formulas for Finite-Time Estimation Errors of Decentralized Temporal Difference Learning with Linear Function Approximation

arXiv:2204.09801v12 citationsh-index: 57
Originality Incremental advance
AI Analysis

This provides precise error quantification for policy evaluation in multi-agent reinforcement learning, which is incremental as it builds on existing decentralized TD methods.

The paper derived exact closed-form formulas for the finite-time mean-squared estimation errors of decentralized temporal difference learning with linear function approximation in multi-agent reinforcement learning, showing that under a stability condition, the error converges to an exact limit at a specific exponential rate.

In this paper, we consider the policy evaluation problem in multi-agent reinforcement learning (MARL) and derive exact closed-form formulas for the finite-time mean-squared estimation errors of decentralized temporal difference (TD) learning with linear function approximation. Our analysis hinges upon the fact that the decentralized TD learning method can be viewed as a Markov jump linear system (MJLS). Then standard MJLS theory can be applied to quantify the mean and covariance matrix of the estimation error of the decentralized TD method at every time step. Various implications of our exact formulas on the algorithm performance are also discussed. An interesting finding is that under a necessary and sufficient stability condition, the mean-squared TD estimation error will converge to an exact limit at a specific exponential rate.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes