Overcoming Data Inequality across Domains with Semi-Supervised Domain Generalization
This work addresses a domain-specific problem of data inequality for machine learning applications with limited labeled data, representing an incremental advancement in domain generalization methods.
The paper tackles the problem of data inequality across domains by addressing Semi-Supervised Domain Generalization (SSDG), where only one domain is labeled and others are unlabeled, and proposes ProUD, which outperforms baseline models on three benchmark datasets.
While there have been considerable advancements in machine learning driven by extensive datasets, a significant disparity still persists in the availability of data across various sources and populations. This inequality across domains poses challenges in modeling for those with limited data, which can lead to profound practical and ethical concerns. In this paper, we address a representative case of data inequality problem across domains termed Semi-Supervised Domain Generalization (SSDG), in which only one domain is labeled while the rest are unlabeled. We propose a novel algorithm, ProUD, which can effectively learn domain-invariant features via domain-aware prototypes along with progressive generalization via uncertainty-adaptive mixing of labeled and unlabeled domains. Our experiments on three different benchmark datasets demonstrate the effectiveness of ProUD, outperforming all baseline models including single domain generalization and semi-supervised learning. Source code will be released upon acceptance of the paper.