ML LGApr 19, 2018

Effects of sampling skewness of the importance-weighted risk estimator on model selection

arXiv:1804.07344v1

Originality Synthesis-oriented

AI Analysis

This addresses a specific issue in machine learning for practitioners dealing with sample selection bias, but it is incremental as it builds on known limitations of importance-weighting.

The paper tackles the problem of sampling skewness in importance-weighted risk estimators, showing that for small sample sizes in selection bias settings, the estimator produces overestimates for most datasets and large underestimates for tail datasets, leading to suboptimal regularization parameters in model selection.

Importance-weighting is a popular and well-researched technique for dealing with sample selection bias and covariate shift. It has desirable characteristics such as unbiasedness, consistency and low computational complexity. However, weighting can have a detrimental effect on an estimator as well. In this work, we empirically show that the sampling distribution of an importance-weighted estimator can be skewed. For sample selection bias settings, and for small sample sizes, the importance-weighted risk estimator produces overestimates for datasets in the body of the sampling distribution, i.e. the majority of cases, and large underestimates for data sets in the tail of the sampling distribution. These over- and underestimates of the risk lead to suboptimal regularization parameters when used for importance-weighted validation.

View on arXiv PDF

Similar