LG NEMay 14

On the Stability of Growth in Structural Plasticity

arXiv:2605.1543530.6

AI Analysis

For researchers in adaptive and continual learning, this work clarifies the fundamental differences between growth and pruning, highlighting that growth is a time-sensitive optimization process rather than a straightforward architecture search operator.

The paper identifies that growing new units during training is not simply the inverse of pruning; newborn units suffer from 'backward starvation' (weak gradient signals) which limits performance in harder tasks. In image classification, Grow achieves high final accuracy but Prune is better when averaging over training or retraining from scratch.

Standard deep-learning pipelines usually choose the network architecture before training and keep it fixed throughout optimization. In contrast, a model can also be adapted by editing its structure during training, for example by pruning existing hidden-neuron units or growing new ones. Although growth is appealing for adaptive and continual systems, we show that it is not simply the inverse of pruning. Pruning selects among units that have participated in training from the start, whereas growth inserts new units into an already specialized optimization trajectory. We isolate this insertion problem and show that newborn units are often forward-active but backward-starved: they participate in the forward computation, yet receive much weaker gradient signal than incumbent units. This disadvantage is minor in small MLP benchmarks, but becomes clear in harder image-classification settings with a convolutional trunk. In these settings, \textsc{Grow} can achieve high final accuracy during the structural-editing procedure, while \textsc{Prune} is stronger when performance is averaged over the training trajectory or when the final sparse network is retrained from scratch. Interventions targeting optimizer state, insertion, selection, and trainability show that improving the integration of newborn units can improve adaptive performance, but does not automatically produce better final subnetworks. In continual-learning benchmarks stressing plasticity loss, \textsc{Grow} becomes competitive mainly when new units have enough time to integrate. Together, these results suggest that \textsc{Grow} should be evaluated not only as an architecture-search operator, but as a time-sensitive optimization process whose success depends on insertion stability.

View on arXiv PDF

Similar