Conditional Linear Regression for Heterogeneous Covariances
This work addresses a specific statistical modeling challenge for scenarios with heterogeneous data, representing an incremental improvement over prior algorithms.
The paper tackles the problem of fitting linear regression only to a subset of data identified by a DNF formula, where previous methods required similar covariances across terms. It presents a polynomial-time algorithm that removes this requirement, improving flexibility in handling heterogeneous covariances.
Often machine learning and statistical models will attempt to describe the majority of the data. However, there may be situations where only a fraction of the data can be fit well by a linear regression model. Here, we are interested in a case where such inliers can be identified by a Disjunctive Normal Form (DNF) formula. We give a polynomial time algorithm for the conditional linear regression task, which identifies a DNF condition together with the linear predictor on the corresponding portion of the data. In this work, we improve on previous algorithms by removing a requirement that the covariances of the data satisfying each of the terms of the condition have to all be very similar in spectral norm to the covariance of the overall condition.