NECVLGMay 2, 2021

Data-driven Weight Initialization with Sylvester Solvers

arXiv:2105.10335v15 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of inefficient parameter initialization for deep learning practitioners, offering an incremental improvement over existing methods.

The paper tackles the problem of initializing deep neural network parameters by proposing a data-driven method that uses input activations and solves a Sylvester equation, achieving a performance boost compared to random initialization, especially in few-shot and fine-tuning settings.

In this work, we propose a data-driven scheme to initialize the parameters of a deep neural network. This is in contrast to traditional approaches which randomly initialize parameters by sampling from transformed standard distributions. Such methods do not use the training data to produce a more informed initialization. Our method uses a sequential layer-wise approach where each layer is initialized using its input activations. The initialization is cast as an optimization problem where we minimize a combination of encoding and decoding losses of the input activations, which is further constrained by a user-defined latent code. The optimization problem is then restructured into the well-known Sylvester equation, which has fast and efficient gradient-free solutions. Our data-driven method achieves a boost in performance compared to random initialization methods, both before start of training and after training is over. We show that our proposed method is especially effective in few-shot and fine-tuning settings. We conclude this paper with analyses on time complexity and the effect of different latent codes on the recognition performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes