LGOCAug 30, 2022

Convergence Rates of Training Deep Neural Networks via Alternating Minimization Methods

arXiv:2208.14318v26 citationsh-index: 13
Originality Synthesis-oriented
AI Analysis

This work provides theoretical insights into optimization methods for deep learning, but it is incremental as it builds on existing alternating minimization frameworks.

The authors tackled the problem of analyzing convergence rates for training deep neural networks using alternating minimization methods, showing local convergence rates based on the KL exponent and establishing R-linear convergence under stronger conditions.

Training deep neural networks (DNNs) is an important and challenging optimization problem in machine learning due to its non-convexity and non-separable structure. The alternating minimization (AM) approaches split the composition structure of DNNs and have drawn great interest in the deep learning and optimization communities. In this paper, we propose a unified framework for analyzing the convergence rate of AM-type network training methods. Our analysis is based on the non-monotone $j$-step sufficient decrease conditions and the Kurdyka-Lojasiewicz (KL) property, which relaxes the requirement of designing descent algorithms. We show the detailed local convergence rate if the KL exponent $θ$ varies in $[0,1)$. Moreover, the local R-linear convergence is discussed under a stronger $j$-step sufficient decrease condition.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes