Domain Generalization: A Tale of Two ERMs
This addresses domain generalization for machine learning practitioners, though it appears incremental as it refines existing ERM approaches under specific assumptions.
The paper tackles the domain generalization problem by showing that domain-informed empirical risk minimization (ERM) outperforms standard pooled ERM when datasets exhibit posterior drift rather than covariate shift, supported by theoretical analysis and experiments on language and vision tasks.
Domain generalization (DG) is the problem of generalizing from several distributions (or domains), for which labeled training data are available, to a new test domain for which no labeled data is available. A common finding in the DG literature is that it is difficult to outperform empirical risk minimization (ERM) on the pooled training data. In this work, we argue that this finding has primarily been reported for datasets satisfying a \emph{covariate shift} assumption. When the dataset satisfies a \emph{posterior drift} assumption instead, we show that ``domain-informed ERM,'' wherein feature vectors are augmented with domain-specific information, outperforms pooling ERM. These claims are supported by a theoretical framework and experiments on language and vision tasks.