Invariance assumptions for class distribution estimation
This work addresses a fundamental challenge in machine learning for scenarios with dataset shift, but it appears incremental as it discusses existing assumptions without introducing a new method.
The paper tackles the problem of estimating class prior probabilities in test datasets under dataset shift, where only features are observed at test time, by analyzing invariance assumptions like covariate shift, factorizable joint shift, and sparse joint shift to facilitate this estimation.
We study the problem of class distribution estimation under dataset shift. On the training dataset, both features and class labels are observed while on the test dataset only the features can be observed. The task then is the estimation of the distribution of the class labels, i.e. the estimation of the class prior probabilities, in the test dataset. Assumptions of invariance between the training joint distribution of features and labels and the test distribution can considerably facilitate this task. We discuss the assumptions of covariate shift, factorizable joint shift, and sparse joint shift and their implications for class distribution estimation.