The global optimum of shallow neural network is attained by ridgelet transform
This provides a theoretical foundation for understanding optimization in neural networks, though it is incremental as it builds on existing ridgelet transform concepts.
The authors proved that the global minimum in backpropagation training for neural networks with arbitrary nonlinear activations is achieved through the ridgelet transform, and computational experiments revealed similarities between hidden parameter distributions and ridgelet spectra.
We prove that the global minimum of the backpropagation (BP) training problem of neural networks with an arbitrary nonlinear activation is given by the ridgelet transform. A series of computational experiments show that there exists an interesting similarity between the scatter plot of hidden parameters in a shallow neural network after the BP training and the spectrum of the ridgelet transform. By introducing a continuous model of neural networks, we reduce the training problem to a convex optimization in an infinite dimensional Hilbert space, and obtain the explicit expression of the global optimizer via the ridgelet transform.