Convergence and Privacy of Decentralized Nonconvex Optimization with Gradient Clipping and Communication Compression
This work addresses communication efficiency and privacy in decentralized machine learning, offering incremental improvements by combining existing techniques like gradient clipping and compression for nonconvex settings.
The paper tackles decentralized nonconvex optimization by integrating gradient clipping with communication compression, proposing PORTER variants that achieve convergence guarantees without bounded gradient assumptions, with results highlighting trade-offs in convergence rate, compression, network connectivity, and privacy.
Achieving communication efficiency in decentralized machine learning has been attracting significant attention, with communication compression recognized as an effective technique in algorithm design. This paper takes a first step to understand the role of gradient clipping, a popular strategy in practice, in decentralized nonconvex optimization with communication compression. We propose PORTER, which considers two variants of gradient clipping added before or after taking a mini-batch of stochastic gradients, where the former variant PORTER-DP allows local differential privacy analysis with additional Gaussian perturbation, and the latter variant PORTER-GC helps to stabilize training. We develop a novel analysis framework that establishes their convergence guarantees without assuming the stringent bounded gradient assumption. To the best of our knowledge, our work provides the first convergence analysis for decentralized nonconvex optimization with gradient clipping and communication compression, highlighting the trade-offs between convergence rate, compression ratio, network connectivity, and privacy.