A Deterministic Streaming Sketch for Ridge Regression
This provides a space-efficient deterministic method for ridge regression in streaming and distributed data, addressing a bottleneck for large-scale machine learning applications.
The paper tackles the problem of estimating ridge regression in a streaming setting with limited space, achieving a solution within ε L2 error using only O(d/ε) space, which is the first deterministic algorithm to use sub-quadratic space with guaranteed error and risk bounds.
We provide a deterministic space-efficient algorithm for estimating ridge regression. For $n$ data points with $d$ features and a large enough regularization parameter, we provide a solution within $\varepsilon$ L$_2$ error using only $O(d/\varepsilon)$ space. This is the first $o(d^2)$ space deterministic streaming algorithm with guaranteed solution error and risk bound for this classic problem. The algorithm sketches the covariance matrix by variants of Frequent Directions, which implies it can operate in insertion-only streams and a variety of distributed data settings. In comparisons to randomized sketching algorithms on synthetic and real-world datasets, our algorithm has less empirical error using less space and similar time.