LGDIS-NNJan 29, 2025

Growing Neural Networks: Dynamic Evolution through Gradient Descent

arXiv:2501.18012v24 citationsh-index: 43Proc R Soc A
AI Analysis

This work addresses the inefficiency of large, static neural networks for researchers and practitioners, offering a potentially more energy-efficient approach, though it appears incremental as it builds on existing gradient-descent methods.

The paper tackles the problem of static neural network structures by introducing two methods for evolving small networks into larger ones during training, which consistently outperform static networks of equivalent final size on nonlinear regression and classification tasks.

In contrast to conventional artificial neural networks, which are structurally static, we present two approaches for evolving small networks into larger ones during training. The first method employs an auxiliary weight that directly controls network size, while the second uses a controller-generated mask to modulate neuron participation. Both approaches optimize network size through the same gradient-descent algorithm that updates the network's weights and biases. We evaluate these growing networks on nonlinear regression and classification tasks, where they consistently outperform static networks of equivalent final size. We then explore the hyperparameter space of these networks to find associated scaling relations relative to their static counterparts. Our results suggest that starting small and growing naturally may be preferable to simply starting large, particularly as neural networks continue to grow in size and energy consumption.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes