LGNASTDec 15, 2020

Strong overall error analysis for the training of artificial neural networks via random initializations

arXiv:2012.08443v13 citations
AI Analysis

This work provides a theoretical improvement on the understanding of convergence rates for deep learning models, which is significant for researchers in mathematical deep learning.

This paper improves convergence rate estimates for the overall error in deep supervised learning. It demonstrates that the required neural network depth can increase significantly slower to achieve the same approximation rate, specifically for arbitrary stochastic optimization algorithms with i.i.d. random initializations.

Although deep learning based approximation algorithms have been applied very successfully to numerous problems, at the moment the reasons for their performance are not entirely understood from a mathematical point of view. Recently, estimates for the convergence of the overall error have been obtained in the situation of deep supervised learning, but with an extremely slow rate of convergence. In this note we partially improve on these estimates. More specifically, we show that the depth of the neural network only needs to increase much slower in order to obtain the same rate of approximation. The results hold in the case of an arbitrary stochastic optimization algorithm with i.i.d.\ random initializations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes