Overall error analysis for the training of deep neural networks via stochastic gradient descent with random initialisation
This work addresses the foundational problem of mathematically analyzing deep learning algorithms for researchers, but it is incremental as it builds on existing methods with limited practical impact.
The authors tackled the lack of rigorous mathematical understanding of deep learning algorithms by providing a full error analysis for deep neural networks trained with stochastic gradient descent and random initialization, obtaining a convergence speed that suffers from the curse of dimensionality and is presumably far from optimal.
In spite of the accomplishments of deep learning based algorithms in numerous applications and very broad corresponding research interest, at the moment there is still no rigorous understanding of the reasons why such algorithms produce useful results in certain situations. A thorough mathematical analysis of deep learning based algorithms seems to be crucial in order to improve our understanding and to make their implementation more effective and efficient. In this article we provide a mathematically rigorous full error analysis of deep learning based empirical risk minimisation with quadratic loss function in the probabilistically strong sense, where the underlying deep neural networks are trained using stochastic gradient descent with random initialisation. The convergence speed we obtain is presumably far from optimal and suffers under the curse of dimensionality. To the best of our knowledge, we establish, however, the first full error analysis in the scientific literature for a deep learning based algorithm in the probabilistically strong sense and, moreover, the first full error analysis in the scientific literature for a deep learning based algorithm where stochastic gradient descent with random initialisation is the employed optimisation method.