LGNEMLMay 22, 2018

Mean Field Theory of Activation Functions in Deep Neural Networks

arXiv:1805.08786v24 citations
Originality Incremental advance
AI Analysis

This provides a theoretical foundation for activation functions in deep learning, which is incremental as it builds on existing mean-field approaches.

The authors developed a Statistical Mechanics model to understand deep neural networks as performing encoding, representation validation, and propagation, deriving common activation functions like Sigmoid, tanh, ReLU, and Swish from mean-field theory, with Swish showing more consistent performance across architectures in classification tasks.

We present a Statistical Mechanics (SM) model of deep neural networks, connecting the energy-based and the feed forward networks (FFN) approach. We infer that FFN can be understood as performing three basic steps: encoding, representation validation and propagation. From the meanfield solution of the model, we obtain a set of natural activations -- such as Sigmoid, $\tanh$ and ReLu -- together with the state-of-the-art, Swish; this represents the expected information propagating through the network and tends to ReLu in the limit of zero noise.We study the spectrum of the Hessian on an associated classification task, showing that Swish allows for more consistent performances over a wider range of network architectures.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes