Stability and Generalization of the Decentralized Stochastic Gradient Descent Ascent Algorithm
This work addresses the generalization gap in decentralized minimax algorithms for machine learning tasks, providing theoretical insights that are incremental but specific to decentralized settings.
The paper tackles the generalization of the decentralized stochastic gradient descent ascent (D-SGDA) algorithm for minimax problems, showing that its decentralized structure does not harm stability and generalization, with results indicating it can generalize as well as vanilla SGDA in some cases, and analyzes topology impacts beyond basic factors like sample size and learning rate.
The growing size of available data has attracted increasing interest in solving minimax problems in a decentralized manner for various machine learning tasks. Previous theoretical research has primarily focused on the convergence rate and communication complexity of decentralized minimax algorithms, with little attention given to their generalization. In this paper, we investigate the primal-dual generalization bound of the decentralized stochastic gradient descent ascent (D-SGDA) algorithm using the approach of algorithmic stability under both convex-concave and nonconvex-nonconcave settings. Our theory refines the algorithmic stability in a decentralized manner and demonstrates that the decentralized structure does not destroy the stability and generalization of D-SGDA, implying that it can generalize as well as the vanilla SGDA in certain situations. Our results analyze the impact of different topologies on the generalization bound of the D-SGDA algorithm beyond trivial factors such as sample sizes, learning rates, and iterations. We also evaluate the optimization error and balance it with the generalization gap to obtain the optimal population risk of D-SGDA in the convex-concave setting. Additionally, we perform several numerical experiments which validate our theoretical findings.