LG MLMar 5, 2019

Implicit Regularization in Over-parameterized Neural Networks

Masayoshi Kubo, Ryotaro Banno, Hidetaka Manabe, Masataka Minoji

arXiv:1903.01997v113.725 citations

Originality Incremental advance

AI Analysis

This addresses the generalization puzzle in deep learning for researchers, though it is incremental as it builds on existing empirical evidence.

The paper investigates how implicit regularization prevents overfitting in over-parameterized neural networks, finding that random initialization and stochastic gradient descent control network outputs to keep interpolations nearly straight, reducing complexity.

Over-parameterized neural networks generalize well in practice without any explicit regularization. Although it has not been proven yet, empirical evidence suggests that implicit regularization plays a crucial role in deep learning and prevents the network from overfitting. In this work, we introduce the gradient gap deviation and the gradient deflection as statistical measures corresponding to the network curvature and the Hessian matrix to analyze variations of network derivatives with respect to input parameters, and investigate how implicit regularization works in ReLU neural networks from both theoretical and empirical perspectives. Our result reveals that the network output between each pair of input samples is properly controlled by random initialization and stochastic gradient descent to keep interpolating between samples almost straight, which results in low complexity of over-parameterized neural networks.

View on arXiv PDF

Similar