MLLGJun 13, 2018

Spurious Local Minima of Deep ReLU Neural Networks in the Neural Tangent Kernel Regime

arXiv:1806.04884v3
Originality Incremental advance
AI Analysis

This addresses a fundamental optimization challenge in deep learning by showing that gradient descent can avoid bad minima in a key theoretical setting.

The paper theoretically proves that deep ReLU neural networks avoid spurious local minima in the loss landscape under the Neural Tangent Kernel regime, specifically when parameters are normally initialized and hidden layer widths approach infinity.

In this paper, we theoretically prove that the deep ReLU neural networks do not lie in spurious local minima in the loss landscape under the Neural Tangent Kernel (NTK) regime, that is, in the gradient descent training dynamics of the deep ReLU neural networks whose parameters are initialized by a normal distribution in the limit as the widths of the hidden layers tend to infinity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes