Oblivious sketching for logistic regression
This provides an efficient streaming solution for logistic regression, which is incremental but addresses a known bottleneck in large-scale data processing.
The paper tackles the problem of solving logistic regression in a single pass over streaming data by introducing the first data-oblivious sketch that compresses d-dimensional datasets from n to poly(μd log n) weighted points, achieving an O(log n)-approximation (or O(1) with modifications) to the original problem.
What guarantees are possible for solving logistic regression in one pass over a data stream? To answer this question, we present the first data oblivious sketch for logistic regression. Our sketch can be computed in input sparsity time over a turnstile data stream and reduces the size of a $d$-dimensional data set from $n$ to only $\operatorname{poly}(μd\log n)$ weighted points, where $μ$ is a useful parameter which captures the complexity of compressing the data. Solving (weighted) logistic regression on the sketch gives an $O(\log n)$-approximation to the original problem on the full data set. We also show how to obtain an $O(1)$-approximation with slight modifications. Our sketches are fast, simple, easy to implement, and our experiments demonstrate their practicality.