LG AIJan 28

Regularized Gradient Temporal-Difference Learning

arXiv:2601.20599v1h-index: 1

Originality Incremental advance

AI Analysis

This addresses a practical issue in off-policy policy evaluation for reinforcement learning, though it is incremental as it builds on existing GTD methods.

The paper tackles the instability of gradient temporal-difference learning when the feature interaction matrix becomes singular by proposing a regularized GTD algorithm that guarantees convergence to a unique solution, validated with theoretical and empirical results.

Gradient temporal-difference (GTD) learning algorithms are widely used for off-policy policy evaluation with function approximation. However, existing convergence analyses rely on the restrictive assumption that the so-called feature interaction matrix (FIM) is nonsingular. In practice, the FIM can become singular and leads to instability or degraded performance. In this paper, we propose a regularized optimization objective by reformulating the mean-square projected Bellman error (MSPBE) minimization. This formulation naturally yields a regularized GTD algorithms, referred to as R-GTD, which guarantees convergence to a unique solution even when the FIM is singular. We establish theoretical convergence guarantees and explicit error bounds for the proposed method, and validate its effectiveness through empirical experiments.

View on arXiv PDF

Similar