LGAINESTMLJan 7, 2019

On the effect of the activation function on the distribution of hidden nodes in a deep network

arXiv:1901.02104v14 citations
Originality Synthesis-oriented
AI Analysis

This work provides theoretical insights into neural network behavior for researchers, but it is incremental as it builds on existing analyses of random networks.

The paper analyzes how activation functions affect the distribution of hidden node lengths in deep networks with random Gaussian weights and biases, showing convergence to a deterministic length map under minimal assumptions as network width increases, and demonstrating failure for violating functions.

We analyze the joint probability distribution on the lengths of the vectors of hidden variables in different layers of a fully connected deep network, when the weights and biases are chosen randomly according to Gaussian distributions, and the input is in $\{ -1, 1\}^N$. We show that, if the activation function $φ$ satisfies a minimal set of assumptions, satisfied by all activation functions that we know that are used in practice, then, as the width of the network gets large, the `length process' converges in probability to a length map that is determined as a simple function of the variances of the random weights and biases, and the activation function $φ$. We also show that this convergence may fail for $φ$ that violate our assumptions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes