MLLGFeb 20, 2025

A Finite Sample Analysis of Distributional TD Learning with Linear Function Approximation

arXiv:2502.14172v28 citationsh-index: 4
Originality Incremental advance
AI Analysis

This provides theoretical insights into the statistical efficiency of distributional reinforcement learning algorithms, but it is incremental as it extends prior tabular analyses to linear approximation.

The paper tackles the problem of estimating the return distribution in Markov decision processes using distributional TD learning with linear function approximation, and shows that the sample complexity matches that of classic linear TD learning, implying no additional difficulty in learning the full distribution compared to its expectation.

In this paper, we study the finite-sample statistical rates of distributional temporal difference (TD) learning with linear function approximation. The aim of distributional TD learning is to estimate the return distribution of a discounted Markov decision process for a given policy π. Previous works on statistical analysis of distributional TD learning mainly focus on the tabular case. In contrast, we first consider the linear function approximation setting and derive sharp finite-sample rates. Our theoretical results demonstrate that the sample complexity of linear distributional TD learning matches that of classic linear TD learning. This implies that, with linear function approximation, learning the full distribution of the return from streaming data is no more difficult than learning its expectation (value function). To derive tight sample complexity bounds, we conduct a fine-grained analysis of the linear-categorical Bellman equation and employ the exponential stability arguments for products of random matrices. Our results provide new insights into the statistical efficiency of distributional reinforcement learning algorithms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes