LGOCJan 30, 2022

Training Thinner and Deeper Neural Networks: Jumpstart Regularization

arXiv:2201.12795v25 citations
AI Analysis

This addresses the challenge of training efficient deep neural networks for machine learning practitioners, though it appears incremental as it builds on existing regularization techniques.

The paper tackles the problem of training deeper neural networks without increasing width, which avoids computational and overparameterization issues, by proposing jumpstart regularization to prevent neurons from dying or becoming linear, resulting in thinner, deeper, and more parameter-efficient models.

Neural networks are more expressive when they have multiple layers. In turn, conventional training methods are only successful if the depth does not lead to numerical issues such as exploding or vanishing gradients, which occur less frequently when the layers are sufficiently wide. However, increasing width to attain greater depth entails the use of heavier computational resources and leads to overparameterized models. These subsequent issues have been partially addressed by model compression methods such as quantization and pruning, some of which relying on normalization-based regularization of the loss function to make the effect of most parameters negligible. In this work, we propose instead to use regularization for preventing neurons from dying or becoming linear, a technique which we denote as jumpstart regularization. In comparison to conventional training, we obtain neural networks that are thinner, deeper, and - most importantly - more parameter-efficient.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes