LG AIMar 22

Rethinking Plasticity in Deep Reinforcement Learning

arXiv:2603.2117388.8h-index: 4

AI Analysis

It addresses a critical challenge in deep RL for researchers and practitioners, offering a novel theoretical framework, but it is incremental as it builds on existing descriptive metrics.

This paper tackled the problem of plasticity loss in deep reinforcement learning, where neural networks struggle to adapt to non-stationary environments, by proposing the Optimization-Centric Plasticity hypothesis and showing that networks with high dormancy in one task can achieve performance parity with randomly initialized networks when switched to a different task.

This paper investigates the fundamental mechanisms driving plasticity loss in deep reinforcement learning (RL), a critical challenge where neural networks lose their ability to adapt to non-stationary environments. While existing research often relies on descriptive metrics like dormant neurons or effective rank, these summaries fail to explain the underlying optimization dynamics. We propose the Optimization-Centric Plasticity (OCP) hypothesis, which posits that plasticity loss arises because optimal points from previous tasks become poor local optima for new tasks, trapping parameters during task transitions and hindering subsequent learning. We theoretically establish the equivalence between neuron dormancy and zero-gradient states, demonstrating that the absence of gradient signals is the primary driver of dormancy. Our experiments reveal that plasticity loss is highly task-specific; notably, networks with high dormancy rates in one task can achieve performance parity with randomly initialized networks when switched to a significantly different task, suggesting that the network's capacity remains intact but is inhibited by the specific optimization landscape. Furthermore, our hypothesis elucidates why parameter constraints mitigate plasticity loss by preventing deep entrenchment in local optima. Validated across diverse non-stationary scenarios, our findings provide a rigorous optimization-based framework for understanding and restoring network plasticity in complex RL domains.

View on arXiv PDF

Similar