Michał Dereziński

10.0DSJul 9

Michał Dereziński, Ethan N. Epperly, Raphael A. Meyer · berkeley

Matrix--vector algorithms, particularly Krylov subspace methods, are widely viewed as the most effective algorithms for solving large systems of linear equations. This paper establishes lower bounds on the worst-case number of matrix--vector products needed by such an algorithm to approximately solve a general linear system. The first main result is that, for any matrix--vector algorithm which is allowed the use of randomization and can perform products with both a matrix and its transpose, $Ω(κ\log(1/\varepsilon))$ matrix--vector products are necessary to solve a linear system with condition number $κ$ to accuracy $\varepsilon$, matching an upper bound for conjugate gradient on the normal equations. The second main result is that one-sided algorithms, which lack access to the transpose, must use $n$ matrix--vector products to solve an $n \times n$ linear system, even when the problem is perfectly conditioned. Both main results include explicit constants that match known upper bounds up to a factor of four. These results rigorously demonstrate the limitations of matrix--vector algorithms and confirm the optimality of widely used Krylov subspace algorithms.

6.0LGApr 10

Last-Iterate Convergence of Randomized Kaczmarz and SGD with Greedy Step Size

Michał Dereziński, Xiaoyu Dong

We study last-iterate convergence of SGD with greedy step size over smooth quadratics in the interpolation regime, a setting which captures the classical Randomized Kaczmarz algorithm as well as other popular iterative linear system solvers. For these methods, we show that the $t$-th iterate attains an $O(1/t^{3/4})$ convergence rate, addressing a question posed by Attia, Schliserman, Sherman, and Koren, who gave an $O(1/t^{1/2})$ guarantee for this setting. In the proof, we introduce the family of stochastic contraction processes, whose behavior can be described by the evolution of a certain deterministic eigenvalue equation, which we analyze via a careful discrete-to-continuous reduction.

Michał Dereziński

2 Papers