MLLGOCFeb 8, 2019

Scalable Holistic Linear Regression

arXiv:1902.03272v224 citations
AI Analysis

This work addresses scalability issues in statistical modeling for researchers and practitioners dealing with large datasets, representing an incremental improvement over prior methods.

The paper tackles the problem of scaling holistic linear regression by introducing a new algorithm that models significance and multicollinearity as lazy constraints, enabling it to handle datasets with tens of thousands of samples compared to previous methods limited to hundreds.

We propose a new scalable algorithm for holistic linear regression building on Bertsimas & King (2016). Specifically, we develop new theory to model significance and multicollinearity as lazy constraints rather than checking the conditions iteratively. The resulting algorithm scales with the number of samples $n$ in the 10,000s, compared to the low 100s in the previous framework. Computational results on real and synthetic datasets show it greatly improves from previous algorithms in accuracy, false detection rate, computational time and scalability.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes