Finite-Sum Optimization: A New Perspective for Convergence to a Global Solution
This work addresses the problem of global optimization in non-convex DNN training for machine learning practitioners, offering a new theoretical direction but is incremental as it builds on existing assumptions.
The paper tackles the challenge of guaranteeing convergence to a global minimum in training deep neural networks by proposing a reformulation and recursive algorithmic framework, achieving convergence to an ε-global minimum with Õ(1/ε³) gradient computations under bounded assumptions.
Deep neural networks (DNNs) have shown great success in many machine learning tasks. Their training is challenging since the loss surface of the network architecture is generally non-convex, or even non-smooth. How and under what assumptions is guaranteed convergence to a \textit{global} minimum possible? We propose a reformulation of the minimization problem allowing for a new recursive algorithmic framework. By using bounded style assumptions, we prove convergence to an $\varepsilon$-(global) minimum using $\mathcal{\tilde{O}}(1/\varepsilon^3)$ gradient computations. Our theoretical foundation motivates further study, implementation, and optimization of the new algorithmic framework and further investigation of its non-standard bounded style assumptions. This new direction broadens our understanding of why and under what circumstances training of a DNN converges to a global minimum.