Personalized Prediction By Learning Halfspace Reference Classes Under Well-Behaved Distribution
This addresses the need for explainable machine learning in domains like healthcare, though it appears incremental as it builds on existing concepts of personalized prediction and sparse classifiers.
The paper tackles the problem of achieving accurate and interpretable predictions in high-stakes applications by proposing a personalized prediction scheme that learns a sparse linear classifier per query, and it proves an upper bound of O(opt^{1/4}) for this model with homogeneous halfspace subsets.
In machine learning applications, predictive models are trained to serve future queries across the entire data distribution. Real-world data often demands excessively complex models to achieve competitive performance, however, sacrificing interpretability. Hence, the growing deployment of machine learning models in high-stakes applications, such as healthcare, motivates the search for methods for accurate and explainable predictions. This work proposes a Personalized Prediction scheme, where an easy-to-interpret predictor is learned per query. In particular, we wish to produce a "sparse linear" classifier with competitive performance specifically on some sub-population that includes the query point. The goal of this work is to study the PAC-learnability of this prediction model for sub-populations represented by "halfspaces" in a label-agnostic setting. We first give a distribution-specific PAC-learning algorithm for learning reference classes for personalized prediction. By leveraging both the reference-class learning algorithm and a list learner of sparse linear representations, we prove the first upper bound, $O(\mathrm{opt}^{1/4} )$, for personalized prediction with sparse linear classifiers and homogeneous halfspace subsets. We also evaluate our algorithms on a variety of standard benchmark data sets.