LGMLJun 15, 2020

Understanding Global Loss Landscape of One-hidden-layer ReLU Networks, Part 2: Experiments and Analysis

arXiv:2006.09192v11 citations
Originality Incremental advance
AI Analysis

This addresses the problem of understanding optimization landscapes for neural networks, providing insights for researchers, but it is incremental as it builds on prior theoretical work.

The paper investigates the probability of local minima in one-hidden-layer ReLU networks, showing it is very low for 1D Gaussian data and predicting no bad differentiable local minima for MNIST and CIFAR-10 when neurons are activated, with experimental verification that gradient descent avoids trapping.

The existence of local minima for one-hidden-layer ReLU networks has been investigated theoretically in [8]. Based on the theory, in this paper, we first analyze how big the probability of existing local minima is for 1D Gaussian data and how it varies in the whole weight space. We show that this probability is very low in most regions. We then design and implement a linear programming based approach to judge the existence of genuine local minima, and use it to predict whether bad local minima exist for the MNIST and CIFAR-10 datasets, and find that there are no bad differentiable local minima almost everywhere in weight space once some hidden neurons are activated by samples. These theoretical predictions are verified experimentally by showing that gradient descent is not trapped in the cells from which it starts. We also perform experiments to explore the count and size of differentiable cells in the weight space.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes