Growth strategies for arbitrary DAG neural architectures
This work addresses the challenge of efficient neural network training for researchers and practitioners, though it appears incremental as it expands on existing neural architecture growth methods.
The paper tackles the problem of high computational and environmental costs in training large neural networks by proposing a method to grow neural architectures directly during training, aiming to reduce both training and inference durations.
Deep learning has shown impressive results obtained at the cost of training huge neural networks. However, the larger the architecture, the higher the computational, financial, and environmental costs during training and inference. We aim at reducing both training and inference durations. We focus on Neural Architecture Growth, which can increase the size of a small model when needed, directly during training using information from the backpropagation. We expand existing work and freely grow neural networks in the form of any Directed Acyclic Graph by reducing expressivity bottlenecks in the architecture. We explore strategies to reduce excessive computations and steer network growth toward more parameter-efficient architectures.