A Unified Analysis of Stochastic Gradient Descent with Arbitrary Data Permutations and Beyond
This work addresses the theoretical gap in analyzing permutation-based SGD algorithms, particularly for dependent permutations, which is incremental but important for optimization and machine learning researchers.
The paper tackles the problem of providing a unified convergence analysis for permutation-based Stochastic Gradient Descent (SGD) by categorizing existing algorithms and developing a general assumption to capture inter-epoch permutation dependencies, resulting in a framework that incorporates all representative algorithms and extends to Federated Learning with client ordering.
We aim to provide a unified convergence analysis for permutation-based Stochastic Gradient Descent (SGD), where data examples are permuted before each epoch. By examining the relations among permutations, we categorize existing permutation-based SGD algorithms into four categories: Arbitrary Permutations, Independent Permutations (including Random Reshuffling), One Permutation (including Incremental Gradient, Shuffle One and Nice Permutation) and Dependent Permutations (including GraBs Lu et al., 2022; Cooper et al., 2023). Existing unified analyses failed to encompass the Dependent Permutations category due to the inter-epoch dependencies in its permutations. In this work, we propose a general assumption that captures the inter-epoch permutation dependencies. Using the general assumption, we develop a unified framework for permutation-based SGD with arbitrary permutations of examples, incorporating all the aforementioned representative algorithms. Furthermore, we adapt our framework on example ordering in SGD for client ordering in Federated Learning (FL). Specifically, we develop a unified framework for regularized-participation FL with arbitrary permutations of clients.