Global Convergence of Four-Layer Matrix Factorization under Random Initialization
This addresses a theoretical gap in understanding deep learning dynamics, but it is incremental as it extends prior work from two-layer to four-layer cases.
The paper tackles the lack of global convergence guarantees for deep matrix factorization under random initialization by providing a polynomial-time global convergence guarantee for four-layer matrix factorization with gradient descent, given specific conditions on the target matrix and balanced regularization.
Gradient descent dynamics on the deep matrix factorization problem is extensively studied as a simplified theoretical model for deep neural networks. Although the convergence theory for two-layer matrix factorization is well-established, no global convergence guarantee for general deep matrix factorization under random initialization has been established to date. To address this gap, we provide a polynomial-time global convergence guarantee for randomly initialized gradient descent on four-layer matrix factorization, given certain conditions on the target matrix and a standard balanced regularization term. Our analysis employs new techniques to show saddle-avoidance properties of gradient decent dynamics, and extends previous theories to characterize the change in eigenvalues of layer weights.