LGMLOct 27, 2020

Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning

arXiv:2010.14498v2161 citations
Originality Incremental advance
AI Analysis

This addresses a fundamental bottleneck in data-efficient deep reinforcement learning for practitioners, though it is incremental in nature.

The paper identifies an implicit under-parameterization phenomenon in value-based deep RL methods, where gradient updates reduce network expressivity and cause performance drops, and shows that mitigating this by controlling rank collapse improves performance on Atari and Gym benchmarks.

We identify an implicit under-parameterization phenomenon in value-based deep RL methods that use bootstrapping: when value functions, approximated using deep neural networks, are trained with gradient descent using iterated regression onto target values generated by previous instances of the value network, more gradient updates decrease the expressivity of the current value network. We characterize this loss of expressivity via a drop in the rank of the learned value network features, and show that this typically corresponds to a performance drop. We demonstrate this phenomenon on Atari and Gym benchmarks, in both offline and online RL settings. We formally analyze this phenomenon and show that it results from a pathological interaction between bootstrapping and gradient-based optimization. We further show that mitigating implicit under-parameterization by controlling rank collapse can improve performance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes