LGJul 10, 2023

Self-Expanding Neural Networks

arXiv:2307.04526v315 citationsh-index: 23
Originality Incremental advance
AI Analysis

This addresses the challenge of architecture tuning for machine learning practitioners, offering an incremental improvement over manual or fixed-size methods.

The paper tackles the problem of neural network architecture selection by introducing a method that starts with a small architecture and expands it during training without restarting, based on a natural gradient approach to add width and depth when beneficial, demonstrating benefits in classification and regression tasks with uncertain architecture sizes.

The results of training a neural network are heavily dependent on the architecture chosen; and even a modification of only its size, however small, typically involves restarting the training process. In contrast to this, we begin training with a small architecture, only increase its capacity as necessary for the problem, and avoid interfering with previous optimization while doing so. We thereby introduce a natural gradient based approach which intuitively expands both the width and depth of a neural network when this is likely to substantially reduce the hypothetical converged training loss. We prove an upper bound on the ``rate'' at which neurons are added, and a computationally cheap lower bound on the expansion score. We illustrate the benefits of such Self-Expanding Neural Networks with full connectivity and convolutions in both classification and regression problems, including those where the appropriate architecture size is substantially uncertain a priori.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes