LGDIS-NNSTAT-MECHMar 23, 2021

Initializing ReLU networks in an expressive subspace of weights

arXiv:2103.12499v34 citations
AI Analysis

This addresses training efficiency for deep neural networks, but it is incremental as it builds on prior initialization strategies.

The paper tackled the problem of signal correlation saturation in deep ReLU networks by analyzing weight correlations, showing that anti-correlated weights enable a chaotic phase with reduced correlations. This led to a new initialization scheme that trains faster than existing methods in a teacher-student setting.

Using a mean-field theory of signal propagation, we analyze the evolution of correlations between two signals propagating forward through a deep ReLU network with correlated weights. Signals become highly correlated in deep ReLU networks with uncorrelated weights. We show that ReLU networks with anti-correlated weights can avoid this fate and have a chaotic phase where the signal correlations saturate below unity. Consistent with this analysis, we find that networks initialized with anti-correlated weights can train faster (in a teacher-student setting) by taking advantage of the increased expressivity in the chaotic phase. Combining this with a previously proposed strategy of using an asymmetric initialization to reduce dead node probability, we propose an initialization scheme that allows faster training and learning than the best-known initializations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes