A convergence result of a continuous model of deep learning via Łojasiewicz--Simon inequality
This provides a theoretical foundation for analyzing asymptotic behavior in nonconvex optimization for deep learning, but it is incremental as it extends existing mathematical tools to a specific model.
The paper tackles the problem of proving convergence for a continuous deep learning model by establishing the existence of a minimizer and a curve of maximal slope, and shows that the Wasserstein-type gradient flow converges to a critical point as time goes to infinity, using the Łojasiewicz--Simon inequality under analyticity assumptions.
This study focuses on a Wasserstein-type gradient flow, which represents an optimization process of a continuous model of a Deep Neural Network (DNN). First, we establish the existence of a minimizer for an average loss of the model under $L^2$-regularization. Subsequently, we show the existence of a curve of maximal slope of the loss. Our main result is the convergence of flow to a critical point of the loss as time goes to infinity. An essential aspect of proving this result involves the establishment of the Łojasiewicz--Simon gradient inequality for the loss. We derive this inequality by assuming the analyticity of NNs and loss functions. Our proofs offer a new approach for analyzing the asymptotic behavior of Wasserstein-type gradient flows for nonconvex functionals.