Convergence of Decentralized Stochastic Subgradient-based Methods for Nonsmooth Nonconvex functions
This work addresses the challenge of ensuring convergence in decentralized training of nonsmooth neural networks, which is an incremental improvement for machine learning practitioners dealing with nonconvex optimization.
The paper tackles the problem of decentralized stochastic subgradient-based methods for minimizing nonsmooth nonconvex functions without Clarke regularity, particularly in training nonsmooth neural networks, by proposing a general framework that unifies methods like DSGD, DSGD-T, and DSGD-M and proving asymptotic convergence to a stable set with small step-sizes, providing first convergence guarantees for these methods.
In this paper, we focus on the decentralized stochastic subgradient-based methods in minimizing nonsmooth nonconvex functions without Clarke regularity, especially in the decentralized training of nonsmooth neural networks. We propose a general framework that unifies various decentralized subgradient-based methods, such as decentralized stochastic subgradient descent (DSGD), DSGD with gradient-tracking technique (DSGD-T), and DSGD with momentum (DSGD-M). To establish the convergence properties of our proposed framework, we relate the discrete iterates to the trajectories of a continuous-time differential inclusion, which is assumed to have a coercive Lyapunov function with a stable set $\mathcal{A}$. We prove the asymptotic convergence of the iterates to the stable set $\mathcal{A}$ with sufficiently small and diminishing step-sizes. These results provide first convergence guarantees for some well-recognized of decentralized stochastic subgradient-based methods without Clarke regularity of the objective function. Preliminary numerical experiments demonstrate that our proposed framework yields highly efficient decentralized stochastic subgradient-based methods with convergence guarantees in the training of nonsmooth neural networks.