DS LG MLMar 31, 2023

Almost Linear Constant-Factor Sketching for $\ell_1$ and Logistic Regression

Alexander Munteanu, Simon Omlor, David Woodruff

arXiv:2304.00051v16.69 citationsh-index: 58Has Code

Originality Incremental advance

AI Analysis

This work improves efficiency for large-scale data analysis in machine learning, though it is incremental over prior sketching methods.

The paper tackles the problem of reducing sketching dimensions for ℓ₁ and logistic regression, achieving near-linear dimensions with O(1)-approximation and offering tradeoffs for 1+ε approximation in input sparsity time.

We improve upon previous oblivious sketching and turnstile streaming results for $\ell_1$ and logistic regression, giving a much smaller sketching dimension achieving $O(1)$-approximation and yielding an efficient optimization problem in the sketch space. Namely, we achieve for any constant $c>0$ a sketching dimension of $\tilde{O}(d^{1+c})$ for $\ell_1$ regression and $\tilde{O}(μd^{1+c})$ for logistic regression, where $μ$ is a standard measure that captures the complexity of compressing the data. For $\ell_1$-regression our sketching dimension is near-linear and improves previous work which either required $Ω(\log d)$-approximation with this sketching dimension, or required a larger $\operatorname{poly}(d)$ number of rows. Similarly, for logistic regression previous work had worse $\operatorname{poly}(μd)$ factors in its sketching dimension. We also give a tradeoff that yields a $1+\varepsilon$ approximation in input sparsity time by increasing the total size to $(d\log(n)/\varepsilon)^{O(1/\varepsilon)}$ for $\ell_1$ and to $(μd\log(n)/\varepsilon)^{O(1/\varepsilon)}$ for logistic regression. Finally, we show that our sketch can be extended to approximate a regularized version of logistic regression where the data-dependent regularizer corresponds to the variance of the individual logistic losses.

View on arXiv PDF Code

Similar