Human-in-the-loop Active Covariance Learning for Improving Prediction in Small Data Sets
This work addresses the challenge of improving prediction accuracy in small data settings for statisticians and data scientists, offering a more efficient alternative to prior elicitation methods.
The paper tackles the problem of learning predictive models from small high-dimensional datasets by proposing a human-in-the-loop method that elicits expert knowledge about pairwise feature similarities to improve covariance estimation, resulting in enhanced predictive performance in high-dimensional linear regression tasks.
Learning predictive models from small high-dimensional data sets is a key problem in high-dimensional statistics. Expert knowledge elicitation can help, and a strong line of work focuses on directly eliciting informative prior distributions for parameters. This either requires considerable statistical expertise or is laborious, as the emphasis has been on accuracy and not on efficiency of the process. Another line of work queries about importance of features one at a time, assuming them to be independent and hence missing covariance information. In contrast, we propose eliciting expert knowledge about pairwise feature similarities, to borrow statistical strength in the predictions, and using sequential decision making techniques to minimize the effort of the expert. Empirical results demonstrate improvement in predictive performance on both simulated and real data, in high-dimensional linear regression tasks, where we learn the covariance structure with a Gaussian process, based on sequential elicitation.