DS LGMay 14, 2019

Dimensionality Reduction for Tukey Regression

Kenneth L. Clarkson, Ruosong Wang, David P. Woodruff

arXiv:1905.05376v111.734 citations

Originality Incremental advance

AI Analysis

This work addresses the overconstrained Tukey regression problem, offering new methods that are fast and easy to implement, though it is incremental as it builds on existing heuristic solvers.

The paper introduces the first dimensionality reduction methods for the Tukey regression problem, which uses a loss function that becomes constant for large residual errors, and demonstrates their practicality with empirical results using existing solvers.

We give the first dimensionality reduction methods for the overconstrained Tukey regression problem. The Tukey loss function $\|y\|_M = \sum_i M(y_i)$ has $M(y_i) \approx |y_i|^p$ for residual errors $y_i$ smaller than a prescribed threshold $τ$, but $M(y_i)$ becomes constant for errors $|y_i| > τ$. Our results depend on a new structural result, proven constructively, showing that for any $d$-dimensional subspace $L \subset \mathbb{R}^n$, there is a fixed bounded-size subset of coordinates containing, for every $y \in L$, all the large coordinates, with respect to the Tukey loss function, of $y$. Our methods reduce a given Tukey regression problem to a smaller weighted version, whose solution is a provably good approximate solution to the original problem. Our reductions are fast, simple and easy to implement, and we give empirical results demonstrating their practicality, using existing heuristic solvers for the small versions. We also give exponential-time algorithms giving provably good solutions, and hardness results suggesting that a significant speedup in the worst case is unlikely.

View on arXiv PDF

Similar