Balance-Subsampled Stable Prediction
This addresses the issue of model-agnostic distribution shift in machine learning, which is a common practical problem, but the approach appears incremental as it builds on existing theory for stable prediction.
The paper tackles the problem of prediction instability caused by distribution shift between training and test data by proposing a balance-subsampled stable prediction algorithm, which significantly outperforms baseline methods in improving accuracy and stability on synthetic and real-world datasets.
In machine learning, it is commonly assumed that training and test data share the same population distribution. However, this assumption is often violated in practice because the sample selection bias may induce the distribution shift from training data to test data. Such a model-agnostic distribution shift usually leads to prediction instability across unknown test data. In this paper, we propose a novel balance-subsampled stable prediction (BSSP) algorithm based on the theory of fractional factorial design. It isolates the clear effect of each predictor from the confounding variables. A design-theoretic analysis shows that the proposed method can reduce the confounding effects among predictors induced by the distribution shift, hence improve both the accuracy of parameter estimation and prediction stability. Numerical experiments on both synthetic and real-world data sets demonstrate that our BSSP algorithm significantly outperforms the baseline methods for stable prediction across unknown test data.