DSLGMLJul 14, 2021

Oblivious sketching for logistic regression

arXiv:2107.06615v123 citations
Originality Highly original
AI Analysis

This provides an efficient streaming solution for logistic regression, which is incremental but addresses a known bottleneck in large-scale data processing.

The paper tackles the problem of solving logistic regression in a single pass over streaming data by introducing the first data-oblivious sketch that compresses d-dimensional datasets from n to poly(μd log n) weighted points, achieving an O(log n)-approximation (or O(1) with modifications) to the original problem.

What guarantees are possible for solving logistic regression in one pass over a data stream? To answer this question, we present the first data oblivious sketch for logistic regression. Our sketch can be computed in input sparsity time over a turnstile data stream and reduces the size of a $d$-dimensional data set from $n$ to only $\operatorname{poly}(μd\log n)$ weighted points, where $μ$ is a useful parameter which captures the complexity of compressing the data. Solving (weighted) logistic regression on the sketch gives an $O(\log n)$-approximation to the original problem on the full data set. We also show how to obtain an $O(1)$-approximation with slight modifications. Our sketches are fast, simple, easy to implement, and our experiments demonstrate their practicality.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes