MLLGNov 29, 2016

The empirical size of trained neural networks

arXiv:1611.09444v13 citations
Originality Synthesis-oriented
AI Analysis

This work provides insights into why neural networks generalize well, addressing a fundamental problem in machine learning theory.

The paper investigates the empirical characteristics of trained ReLU neural networks, showing they are significantly simpler than their parameter count suggests and that this forced simplicity is crucial for their success.

ReLU neural networks define piecewise linear functions of their inputs. However, initializing and training a neural network is very different from fitting a linear spline. In this paper, we expand empirically upon previous theoretical work to demonstrate features of trained neural networks. Standard network initialization and training produce networks vastly simpler than a naive parameter count would suggest and can impart odd features to the trained network. However, we also show the forced simplicity is beneficial and, indeed, critical for the wide success of these networks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes