LGAIMLJun 30, 2017

Towards Understanding Generalization of Deep Learning: Perspective of Loss Landscapes

arXiv:1706.10239v2237 citations
Originality Incremental advance
AI Analysis

This addresses the fundamental problem of understanding generalization in deep learning for researchers and practitioners, but it is incremental as it builds on existing landscape perspectives.

The paper investigates why deep neural networks generalize well despite overparameterization, finding that the loss landscape's basin of attraction for good minima is larger than for poor ones, which leads optimization to converge to solutions with good generalization. Theoretical analysis for 2-layer networks shows low-complexity solutions have small Hessian norms, supported by numerical evidence for deeper networks.

It is widely observed that deep learning models with learned parameters generalize well, even with much more model parameters than the number of training samples. We systematically investigate the underlying reasons why deep neural networks often generalize well, and reveal the difference between the minima (with the same training error) that generalize well and those they don't. We show that it is the characteristics the landscape of the loss function that explains the good generalization capability. For the landscape of loss function for deep networks, the volume of basin of attraction of good minima dominates over that of poor minima, which guarantees optimization methods with random initialization to converge to good minima. We theoretically justify our findings through analyzing 2-layer neural networks; and show that the low-complexity solutions have a small norm of Hessian matrix with respect to model parameters. For deeper networks, extensive numerical evidence helps to support our arguments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes