AutoInit: Automatic Initialization via Jacobian Tuning
This addresses the need for efficient and automatic initialization in deep learning, reducing reliance on trial-and-error or sub-optimal inherited methods, though it is incremental as it builds on existing initialization techniques.
The authors tackled the problem of finding good initialization for deep neural networks by introducing AutoInit, a cheap algorithm that automatically tunes hyperparameters to criticality using Jacobians between network blocks. The method achieved good performance on vision tasks with ResMLP and VGG architectures.
Good initialization is essential for training Deep Neural Networks (DNNs). Oftentimes such initialization is found through a trial and error approach, which has to be applied anew every time an architecture is substantially modified, or inherited from smaller size networks leading to sub-optimal initialization. In this work we introduce a new and cheap algorithm, that allows one to find a good initialization automatically, for general feed-forward DNNs. The algorithm utilizes the Jacobian between adjacent network blocks to tune the network hyperparameters to criticality. We solve the dynamics of the algorithm for fully connected networks with ReLU and derive conditions for its convergence. We then extend the discussion to more general architectures with BatchNorm and residual connections. Finally, we apply our method to ResMLP and VGG architectures, where the automatic one-shot initialization found by our method shows good performance on vision tasks.