Conditional Linear Regression
This addresses the issue of poor model performance on heterogeneous datasets for researchers and practitioners in machine learning and statistics, offering a targeted approach to improve prediction accuracy in specific segments.
The paper tackles the problem of finding a small subset of data where a linear model can achieve accurate predictions, rather than modeling the entire dataset, and presents an efficient algorithm with theoretical analysis for this conditional linear regression task.
Work in machine learning and statistics commonly focuses on building models that capture the vast majority of data, possibly ignoring a segment of the population as outliers. However, there does not often exist a good model on the whole dataset, so we seek to find a small subset where there exists a useful model. We are interested in finding a linear rule capable of achieving more accurate predictions for just a segment of the population. We give an efficient algorithm with theoretical analysis for the conditional linear regression task, which is the joint task of identifying a significant segment of the population, described by a k-DNF, along with its linear regression fit.