LGSep 18, 2024
Unsupervised Domain Adaptation Via Data PruningAndrea Napoli, Paul White
The removal of carefully-selected examples from training data has recently emerged as an effective way of improving the robustness of machine learning models. However, the best way to select these examples remains an open question. In this paper, we consider the problem from the perspective of unsupervised domain adaptation (UDA). We propose AdaPrune, a method for UDA whereby training examples are removed to attempt to align the training distribution to that of the target data. By adopting the maximum mean discrepancy (MMD) as the criterion for alignment, the problem can be neatly formulated and solved as an integer quadratic program. We evaluate our approach on a real-world domain shift task of bioacoustic event detection. As a method for UDA, we show that AdaPrune outperforms related techniques, and is complementary to other UDA algorithms such as CORAL. Our analysis of the relationship between the MMD and model accuracy, along with t-SNE plots, validate the proposed method as a principled and well-founded way of performing data pruning.
8.4LGMay 6
Order Matters: Improving Domain Adaptation by Reordering DataAndrea Napoli, Paul White
Domain shift remains a key challenge in deploying machine learning models to the real world. Unsupervised domain adaptation (UDA) aims to address this by minimising domain discrepancy during training, but the discrepancy estimates suffer from high variance in stochastic settings, which can stifle the theoretical benefits of the method. This paper proposes Optimal Reordering of Data for Error-Reduced Estimation of Discrepancy (ORDERED), a novel unbiased stochastic variance reduction technique which reduces the discrepancy estimation error by optimising the order in which the training data are sampled. We consider two specific domain discrepancy losses (correlation alignment and the maximum mean discrepancy), formulate their stochastic estimation error as a function of the data sampling order, and propose a practical optimisation algorithm. Our simulations demonstrate reduced variance compared to related methods, and experiments on two domain shift image classification benchmarks show improved target domain accuracy.
LGDec 4, 2025
Variance Matters: Improving Domain Adaptation via Stratified SamplingAndrea Napoli, Paul White
Domain shift remains a key challenge in deploying machine learning models to the real world. Unsupervised domain adaptation (UDA) aims to address this by minimising domain discrepancy during training, but the discrepancy estimates suffer from high variance in stochastic settings, which can stifle the theoretical benefits of the method. This paper proposes Variance-Reduced Domain Adaptation via Stratified Sampling (VaRDASS), the first specialised stochastic variance reduction technique for UDA. We consider two specific discrepancy measures -- correlation alignment and the maximum mean discrepancy (MMD) -- and derive ad hoc stratification objectives for these terms. We then present expected and worst-case error bounds, and prove that our proposed objective for the MMD is theoretically optimal (i.e., minimises the variance) under certain assumptions. Finally, a practical k-means style optimisation algorithm is introduced and analysed. Experiments on three domain shift datasets demonstrate improved discrepancy estimation accuracy and target domain performance.