LGAICVOCMLApr 12, 2021

A Recipe for Global Convergence Guarantee in Deep Neural Networks

arXiv:2104.05785v214 citations
Originality Highly original
AI Analysis

This provides a theoretical foundation for deep learning convergence in practical settings, addressing a key gap for researchers and practitioners, though it is incremental as it builds on existing work.

The paper tackles the lack of global convergence guarantees for gradient descent in practical deep networks beyond the neural tangent kernel regime by proposing an algorithm with such guarantees under a verifiable expressivity condition, showing it holds theoretically for certain architectures and numerically for ResNets on image datasets, with generalization performance comparable to heuristic methods.

Existing global convergence guarantees of (stochastic) gradient descent do not apply to practical deep networks in the practical regime of deep learning beyond the neural tangent kernel (NTK) regime. This paper proposes an algorithm, which is ensured to have global convergence guarantees in the practical regime beyond the NTK regime, under a verifiable condition called the expressivity condition. The expressivity condition is defined to be both data-dependent and architecture-dependent, which is the key property that makes our results applicable for practical settings beyond the NTK regime. On the one hand, the expressivity condition is theoretically proven to hold data-independently for fully-connected deep neural networks with narrow hidden layers and a single wide layer. On the other hand, the expressivity condition is numerically shown to hold data-dependently for deep (convolutional) ResNet with batch normalization with various standard image datasets. We also show that the proposed algorithm has generalization performances comparable with those of the heuristic algorithm, with the same hyper-parameters and total number of iterations. Therefore, the proposed algorithm can be viewed as a step towards providing theoretical guarantees for deep learning in the practical regime.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes