LGOCMLJun 14, 2021

Analysis of a Target-Based Actor-Critic Algorithm with Linear Function Approximation

arXiv:2106.07472v214 citations
AI Analysis

This work addresses a gap in reinforcement learning theory for researchers and practitioners, offering foundational insights into widely used but poorly understood target network techniques, though it is incremental as it builds on existing actor-critic frameworks.

The paper tackles the lack of theoretical understanding of target networks in actor-critic methods by providing the first theoretical analysis of an online target-based actor-critic algorithm with linear function approximation in discounted reward settings, establishing asymptotic convergence and finite-time analysis to show the impact of target networks.

Actor-critic methods integrating target networks have exhibited a stupendous empirical success in deep reinforcement learning. However, a theoretical understanding of the use of target networks in actor-critic methods is largely missing in the literature. In this paper, we reduce this gap between theory and practice by proposing the first theoretical analysis of an online target-based actor-critic algorithm with linear function approximation in the discounted reward setting. Our algorithm uses three different timescales: one for the actor and two for the critic. Instead of using the standard single timescale temporal difference (TD) learning algorithm as a critic, we use a two timescales target-based version of TD learning closely inspired from practical actor-critic algorithms implementing target networks. First, we establish asymptotic convergence results for both the critic and the actor under Markovian sampling. Then, we provide a finite-time analysis showing the impact of incorporating a target network into actor-critic methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes