gLOP: the global and Local Penalty for Capturing Predictive Heterogeneity
This addresses the problem of focusing data collection for researchers and practitioners by identifying hard-to-predict instances, though it is incremental as it builds on existing multitask learning methods.
The paper tackles predictive heterogeneity in supervised learning, where some instances are predictive outliers poorly handled by standard models, and introduces gLOP, a penalized regression framework for multitask learning to identify these outliers, with empirical results on synthetic and health data.
When faced with a supervised learning problem, we hope to have rich enough data to build a model that predicts future instances well. However, in practice, problems can exhibit predictive heterogeneity: most instances might be relatively easy to predict, while others might be predictive outliers for which a model trained on the entire dataset does not perform well. Identifying these can help focus future data collection. We present gLOP, the global and Local Penalty, a framework for capturing predictive heterogeneity and identifying predictive outliers. gLOP is based on penalized regression for multitask learning, which improves learning by leveraging training signal information from related tasks. We give two optimization algorithms for gLOP, one space-efficient, and another giving the full regularization path. We also characterize uniqueness in terms of the data and tuning parameters, and present empirical results on synthetic data and on two health research problems.