Domain Generalization via Optimal Transport with Metric Similarity Learning
This addresses the problem of domain generalization for machine learning models that need to perform well on unseen data distributions, though it appears incremental as it builds on existing invariant feature learning approaches.
The paper tackles domain generalization by learning invariant features across multiple source domains to generalize to unseen target domains, using optimal transport with Wasserstein distance and metric learning to incorporate label information for better classification boundaries. Empirical results show the method outperforms most baselines, with ablation studies confirming the effectiveness of its components.
Generalizing knowledge to unseen domains, where data and labels are unavailable, is crucial for machine learning models. We tackle the domain generalization problem to learn from multiple source domains and generalize to a target domain with unknown statistics. The crucial idea is to extract the underlying invariant features across all the domains. Previous domain generalization approaches mainly focused on learning invariant features and stacking the learned features from each source domain to generalize to a new target domain while ignoring the label information, which will lead to indistinguishable features with an ambiguous classification boundary. For this, one possible solution is to constrain the label-similarity when extracting the invariant features and to take advantage of the label similarities for class-specific cohesion and separation of features across domains. Therefore we adopt optimal transport with Wasserstein distance, which could constrain the class label similarity, for adversarial training and also further deploy a metric learning objective to leverage the label information for achieving distinguishable classification boundary. Empirical results show that our proposed method could outperform most of the baselines. Furthermore, ablation studies also demonstrate the effectiveness of each component of our method.