Worst-case Complexity of Cyclic Coordinate Descent: $O(n^2)$ Gap with Randomized Version
For researchers in optimization, it resolves a long-standing open question about the worst-case gap between cyclic and randomized coordinate descent, showing the gap is indeed quadratic in dimension.
This paper proves that cyclic coordinate descent can be O(n^2) times slower than randomized coordinate descent in the worst case for minimizing convex quadratics, establishing a rigorous example with complexity O(n^4 κ_CD log(1/ε)) versus O(n^2 κ_CD log(1/ε)).
This paper concerns the worst-case complexity of cyclic coordinate descent (C-CD) for minimizing a convex quadratic function, which is equivalent to Gauss-Seidel method and can be transformed to Kaczmarz method and projection onto convex sets (POCS). We observe that the known provable complexity of C-CD can be $O(n^2)$ times slower than randomized coordinate descent (R-CD), but no example was rigorously proven to exhibit such a large gap. In this paper we show that the gap indeed exists. We prove that there exists an example for which C-CD takes at least $O(n^4 κ_{\text{CD}} \log\frac{1}ε)$ operations, where $κ_{\text{CD}}$ is related to Demmel's condition number and it determines the convergence rate of R-CD. It implies that in the worst case C-CD can indeed be $O(n^2)$ times slower than R-CD, which has complexity $O( n^2 κ_{\text{CD}} \log\frac{1}ε)$. Note that for this example, the gap exists for any fixed update order, not just a particular order. Based on the example, we establish several almost tight complexity bounds of C-CD for quadratic problems. One difficulty with the analysis is that the spectral radius of a non-symmetric iteration matrix does not necessarily constitute a \textit{lower bound} for the convergence rate. An immediate consequence is that for Gauss-Seidel method, Kaczmarz method and POCS, there is also an $O(n^2) $ gap between the cyclic versions and randomized versions (for solving linear systems). We also show that the classical convergence rate of POCS by Smith, Solmon and Wager [1] is always worse and sometimes can be infinitely times worse than our bound.