Adam-family Methods for Nonsmooth Optimization with Convergence Guarantees
This work addresses convergence issues in nonsmooth optimization for neural networks, providing theoretical guarantees that could benefit machine learning practitioners, though it is incremental as it builds on existing Adam-family methods.
The paper tackles the problem of training nonsmooth neural networks by introducing a two-timescale framework for Adam-family methods, proving convergence guarantees under mild assumptions and demonstrating high efficiency and robustness in numerical experiments.
In this paper, we present a comprehensive study on the convergence properties of Adam-family methods for nonsmooth optimization, especially in the training of nonsmooth neural networks. We introduce a novel two-timescale framework that adopts a two-timescale updating scheme, and prove its convergence properties under mild assumptions. Our proposed framework encompasses various popular Adam-family methods, providing convergence guarantees for these methods in training nonsmooth neural networks. Furthermore, we develop stochastic subgradient methods that incorporate gradient clipping techniques for training nonsmooth neural networks with heavy-tailed noise. Through our framework, we show that our proposed methods converge even when the evaluation noises are only assumed to be integrable. Extensive numerical experiments demonstrate the high efficiency and robustness of our proposed methods.