The gradient complexity of linear regression
This provides foundational theoretical limits for optimization algorithms in machine learning, though it is incremental as it builds on existing complexity frameworks.
The paper tackles the computational complexity of linear regression and eigenvector computation in a matrix-vector product oracle model, showing that Θ(d) oracle calls are necessary and sufficient for polynomial accuracy, even with randomization.
We investigate the computational complexity of several basic linear algebra primitives, including largest eigenvector computation and linear regression, in the computational model that allows access to the data via a matrix-vector product oracle. We show that for polynomial accuracy, $Θ(d)$ calls to the oracle are necessary and sufficient even for a randomized algorithm. Our lower bound is based on a reduction to estimating the least eigenvalue of a random Wishart matrix. This simple distribution enables a concise proof, leveraging a few key properties of the random Wishart ensemble.