LGAIJun 3, 2024

Learning the Target Network in Function Space

arXiv:2406.01838v22 citations
AI Analysis

This addresses the challenge of stable value-function approximation in reinforcement learning, offering a domain-specific improvement for RL practitioners.

The paper tackles the problem of learning value functions in reinforcement learning by proposing Lookahead-Replicate (LR), a new algorithm that maintains equivalence between online and target networks in function space rather than parameter space. The result shows that LR leads to convergent behavior and significantly improves deep RL performance on the Atari benchmark.

We focus on the task of learning the value function in the reinforcement learning (RL) setting. This task is often solved by updating a pair of online and target networks while ensuring that the parameters of these two networks are equivalent. We propose Lookahead-Replicate (LR), a new value-function approximation algorithm that is agnostic to this parameter-space equivalence. Instead, the LR algorithm is designed to maintain an equivalence between the two networks in the function space. This value-based equivalence is obtained by employing a new target-network update. We show that LR leads to a convergent behavior in learning the value function. We also present empirical results demonstrating that LR-based target-network updates significantly improve deep RL on the Atari benchmark.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes