Provably Convergent Decentralized Optimization over Directed Graphs under Generalized Smoothness
This addresses the problem of decentralized learning in heterogeneous data environments for large-scale systems, offering a more applicable framework than existing methods.
The paper tackles decentralized optimization under generalized smoothness to handle rapidly varying gradients, achieving provable convergence over directed graphs without requiring bounded gradient dissimilarity, with numerical experiments showing superior stability and faster convergence on benchmark datasets like CIFAR-10.
Decentralized optimization has become a fundamental tool for large-scale learning systems; however, most existing methods rely on the classical Lipschitz smoothness assumption, which is often violated in problems with rapidly varying gradients. Motivated by this limitation, we study decentralized optimization under the generalized $(L_0, L_1)$-smoothness framework, in which the Hessian norm is allowed to grow linearly with the gradient norm, thereby accommodating rapidly varying gradients beyond classical Lipschitz smoothness. We integrate gradient-tracking techniques with gradient clipping and carefully design the clipping threshold to ensure accurate convergence over directed communication graphs under generalized smoothness. In contrast to existing distributed optimization results under generalized smoothness that require a bounded gradient dissimilarity assumption, our results remain valid even when the gradient dissimilarity is unbounded, making the proposed framework more applicable to realistic heterogeneous data environments. We validate our approach via numerical experiments on standard benchmark datasets, including LIBSVM and CIFAR-10, using regularized logistic regression and convolutional neural networks, demonstrating superior stability and faster convergence over existing methods.