LGMLNov 3, 2022

Domain Adaptation under Missingness Shift

arXiv:2211.02093v315 citationsh-index: 58
Originality Incremental advance
AI Analysis

This addresses domain adaptation for real-world data where missingness patterns shift across domains, an incremental but practical extension of existing theory.

The paper tackles domain adaptation when missing data patterns differ between source and target domains, showing that without missingness indicators, standard covariate shift fails and optimal source predictors can perform arbitrarily poorly. It proves the optimal target predictor can be identified even without knowing missingness rates and provides a simple adjustment for linear models that yields consistent parameter estimates.

Rates of missing data often depend on record-keeping policies and thus may change across times and locations, even when the underlying features are comparatively stable. In this paper, we introduce the problem of Domain Adaptation under Missingness Shift (DAMS). Here, (labeled) source data and (unlabeled) target data would be exchangeable but for different missing data mechanisms. We show that if missing data indicators are available, DAMS reduces to covariate shift. Addressing cases where such indicators are absent, we establish the following theoretical results for underreporting completely at random: (i) covariate shift is violated (adaptation is required); (ii) the optimal linear source predictor can perform arbitrarily worse on the target domain than always predicting the mean; (iii) the optimal target predictor can be identified, even when the missingness rates themselves are not; and (iv) for linear models, a simple analytic adjustment yields consistent estimates of the optimal target parameters. In experiments on synthetic and semi-synthetic data, we demonstrate the promise of our methods when assumptions hold. Finally, we discuss a rich family of future extensions.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes