LGDSMLNov 15, 2021

Conditional Linear Regression for Heterogeneous Covariances

arXiv:2111.07834v11 citations
Originality Incremental advance
AI Analysis

This work addresses a specific statistical modeling challenge for scenarios with heterogeneous data, representing an incremental improvement over prior algorithms.

The paper tackles the problem of fitting linear regression only to a subset of data identified by a DNF formula, where previous methods required similar covariances across terms. It presents a polynomial-time algorithm that removes this requirement, improving flexibility in handling heterogeneous covariances.

Often machine learning and statistical models will attempt to describe the majority of the data. However, there may be situations where only a fraction of the data can be fit well by a linear regression model. Here, we are interested in a case where such inliers can be identified by a Disjunctive Normal Form (DNF) formula. We give a polynomial time algorithm for the conditional linear regression task, which identifies a DNF condition together with the linear predictor on the corresponding portion of the data. In this work, we improve on previous algorithms by removing a requirement that the covariances of the data satisfying each of the terms of the condition have to all be very similar in spectral norm to the covariance of the overall condition.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes