MLLGJan 10, 2020

Data-Dependence of Plateau Phenomenon in Learning with Neural Network --- Statistical Mechanical Analysis

arXiv:2001.03371v142 citations
AI Analysis

This work addresses a gap between classical theory and practical observations in machine learning, providing insights into data-dependent learning dynamics, though it is incremental in nature.

The paper tackles the discrepancy between theoretical predictions of the plateau phenomenon in neural network learning and its rare occurrence in modern deep learning, showing that data with small and dispersed eigenvalues in its covariance matrix tends to make the plateau less noticeable.

The plateau phenomenon, wherein the loss value stops decreasing during the process of learning, has been reported by various researchers. The phenomenon is actively inspected in the 1990s and found to be due to the fundamental hierarchical structure of neural network models. Then the phenomenon has been thought as inevitable. However, the phenomenon seldom occurs in the context of recent deep learning. There is a gap between theory and reality. In this paper, using statistical mechanical formulation, we clarified the relationship between the plateau phenomenon and the statistical property of the data learned. It is shown that the data whose covariance has small and dispersed eigenvalues tend to make the plateau phenomenon inconspicuous.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes