LGOct 21, 2022

Bridging the Gap Between Target Networks and Functional Regularization

MILA
arXiv:2210.12282v27 citationsh-index: 54
AI Analysis

This addresses training instability in reinforcement learning for practitioners, though it is incremental as it builds on existing Target Network methods.

The paper tackles the instability in deep reinforcement learning caused by bootstrapping by showing that Target Networks act as an implicit regularizer, and proposes Functional Regularization as a convex alternative, leading to better sample efficiency and performance improvements.

Bootstrapping is behind much of the successes of Deep Reinforcement Learning. However, learning the value function via bootstrapping often leads to unstable training due to fast-changing target values. Target Networks are employed to stabilize training by using an additional set of lagging parameters to estimate the target values. Despite the popularity of Target Networks, their effect on the optimization is still misunderstood. In this work, we show that they act as an implicit regularizer. This regularizer has disadvantages such as being inflexible and non convex. To overcome these issues, we propose an explicit Functional Regularization that is a convex regularizer in function space and can easily be tuned. We analyze the convergence of our method theoretically and empirically demonstrate that replacing Target Networks with the more theoretically grounded Functional Regularization approach leads to better sample efficiency and performance improvements.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes