ML LG MESep 24, 2025

Unsupervised Domain Adaptation with an Unobservable Source Subpopulation

Chao Ying, Jun Jin, Haotian Zhang, Qinglong Tian, Yanyuan Ma, Yixuan Li, Jiwei Zhao

arXiv:2509.20587v14.5h-index: 1

Originality Incremental advance

AI Analysis

This addresses domain adaptation challenges in machine learning for scenarios with missing source data, though it is incremental as it builds on existing structured missingness frameworks.

The paper tackles unsupervised domain adaptation when a source subpopulation is unobservable, showing that prediction in the target domain can still be recovered with rigorous models and theoretical guarantees, and experiments demonstrate it outperforms naive benchmarks.

We study an unsupervised domain adaptation problem where the source domain consists of subpopulations defined by the binary label $Y$ and a binary background (or environment) $A$. We focus on a challenging setting in which one such subpopulation in the source domain is unobservable. Naively ignoring this unobserved group can result in biased estimates and degraded predictive performance. Despite this structured missingness, we show that the prediction in the target domain can still be recovered. Specifically, we rigorously derive both background-specific and overall prediction models for the target domain. For practical implementation, we propose the distribution matching method to estimate the subpopulation proportions. We provide theoretical guarantees for the asymptotic behavior of our estimator, and establish an upper bound on the prediction error. Experiments on both synthetic and real-world datasets show that our method outperforms the naive benchmark that does not account for this unobservable source subpopulation.

View on arXiv PDF

Similar