LG AI DS MLFeb 2, 2022

Nonlinear Initialization Methods for Low-Rank Neural Networks

Kiran Vodrahalli, Rakesh Shivanna, Maheswaran Sathiamoorthy, Sagar Jain, Ed H. Chi

arXiv:2202.00834v37.84 citations

Originality Incremental advance

AI Analysis

This work addresses the initialization bottleneck for low-rank neural networks, which is crucial for efficient training in resource-constrained settings, though it appears incremental as it builds on existing low-rank methods.

The paper tackles the problem of initializing low-rank neural networks by proposing a novel framework that focuses on approximating layer functions rather than parameter values, demonstrating a significant gap compared to prior spectral initialization methods, particularly for ReLU networks as rank decreases or input dimension increases, and validates the approach by training ResNet and EfficientNet models on ImageNet.

We propose a novel low-rank initialization framework for training low-rank deep neural networks -- networks where the weight parameters are re-parameterized by products of two low-rank matrices. The most successful prior existing approach, spectral initialization, draws a sample from the initialization distribution for the full-rank setting and then optimally approximates the full-rank initialization parameters in the Frobenius norm with a pair of low-rank initialization matrices via singular value decomposition. Our method is inspired by the insight that approximating the function corresponding to each layer is more important than approximating the parameter values. We provably demonstrate that there is a significant gap between these two approaches for ReLU networks, particularly as the desired rank of the approximating weights decreases, or as the dimension of the inputs to the layer increases (the latter point holds when the network width is super-linear in dimension). Along the way, we provide the first provably efficient algorithm for solving the ReLU low-rank approximation problem for fixed parameter rank $r$ -- previously, it was unknown that the problem was computationally tractable to solve even for rank $1$. We also provide a practical algorithm to solve this problem which is no more expensive than the existing spectral initialization approach, and validate our theory by training ResNet and EfficientNet models (He et al., 2016; Tan & Le, 2019) on ImageNet (Russakovsky et al., 2015).

View on arXiv PDF

Similar