Beyond Folklore: A Scaling Calculus for the Design and Initialization of ReLU Networks
This addresses the challenge of efficient network design and training for machine learning practitioners, though it appears incremental as it builds on existing initialization methods.
The paper tackles the problem of designing and initializing ReLU neural networks by proposing a scaling calculus to calculate a scaling constant for layers and weights, which relates to optimizability and suggests using the geometric mean of fan-in and fan-out for weight initialization variance, potentially replacing blind experimentation.
We propose a system for calculating a "scaling constant" for layers and weights of neural networks. We relate this scaling constant to two important quantities that relate to the optimizability of neural networks, and argue that a network that is "preconditioned" via scaling, in the sense that all weights have the same scaling constant, will be easier to train. This scaling calculus results in a number of consequences, among them the fact that the geometric mean of the fan-in and fan-out, rather than the fan-in, fan-out, or arithmetic mean, should be used for the initialization of the variance of weights in a neural network. Our system allows for the off-line design & engineering of ReLU neural networks, potentially replacing blind experimentation.