LGAIFeb 20, 2023

Improving Deep Policy Gradients with Value Function Search

arXiv:2302.10145v118 citationsh-index: 33
Originality Incremental advance
AI Analysis

This work addresses a bottleneck in Deep Policy Gradient methods for reinforcement learning, offering an incremental improvement in value approximation without extra computational costs.

The paper tackles the problem of value function approximation getting stuck in local optima in Deep Policy Gradient algorithms, which limits variance reduction and leads to sub-optimal policies. It introduces a Value Function Search method that uses perturbed value networks to improve approximation, resulting in enhanced sample efficiency and higher returns on continuous control benchmarks.

Deep Policy Gradient (PG) algorithms employ value networks to drive the learning of parameterized policies and reduce the variance of the gradient estimates. However, value function approximation gets stuck in local optima and struggles to fit the actual return, limiting the variance reduction efficacy and leading policies to sub-optimal performance. This paper focuses on improving value approximation and analyzing the effects on Deep PG primitives such as value prediction, variance reduction, and correlation of gradient estimates with the true gradient. To this end, we introduce a Value Function Search that employs a population of perturbed value networks to search for a better approximation. Our framework does not require additional environment interactions, gradient computations, or ensembles, providing a computationally inexpensive approach to enhance the supervised learning task on which value networks train. Crucially, we show that improving Deep PG primitives results in improved sample efficiency and policies with higher returns using common continuous control benchmark domains.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes