Shuffling the Stochastic Mirror Descent via Dual Lipschitz Continuity and Kernel Conditioning
This work addresses a theoretical bottleneck in optimization for machine learning researchers, enabling analysis of advanced techniques under weaker assumptions, though it is incremental as it builds on the relative smoothness framework.
The paper tackled the challenge of analyzing optimization algorithms like random reshuffling mirror descent without relying on global Lipschitz smoothness, by introducing the dual kernel conditioning (DKC) regularity condition to ensure dual Lipschitz continuity, and established the first complexity bounds and iterate convergence for constrained nonconvex relative smooth problems.
The global Lipschitz smoothness condition underlies most convergence and complexity analyses via two key consequences: the descent lemma and the gradient Lipschitz continuity. How to study the performance of optimization algorithms in the absence of Lipschitz smoothness remains an active area. The relative smoothness framework from Bauschke-Bolte-Teboulle (2017) and Lu-Freund-Nesterov (2018) provides an extended descent lemma, ensuring convergence of Bregman-based proximal gradient methods and their vanilla stochastic counterparts. However, many widely used techniques (e.g., momentum schemes, random reshuffling, and variance reduction) additionally require the Lipschitz-type bound for gradient deviations, leaving their analysis under relative smoothness an open area. To resolve this issue, we introduce the dual kernel conditioning (DKC) regularity condition to regulate the local relative curvature of the kernel functions. Combined with the relative smoothness, DKC provides a dual Lipschitz continuity for gradients: even though the gradient mapping is not Lipschitz in the primal space, it preserves Lipschitz continuity in the dual space induced by a mirror map. We verify that DKC is widely satisfied by popular kernels and is closed under affine composition and conic combination. With these novel tools, we establish the first complexity bounds as well as the iterate convergence of random reshuffling mirror descent for constrained nonconvex relative smooth problems.