CLAIFeb 24, 2022

DoCoGen: Domain Counterfactual Generation for Low Resource Domain Adaptation

arXiv:2202.12350v2649 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of low-resource domain adaptation for NLP practitioners by enabling data augmentation without labeled examples, though it is incremental as it builds on existing generation and DA methods.

The paper tackles the problem of domain adaptation in NLP when labeled data is scarce by proposing DoCoGen, a controllable generation algorithm that creates domain-counterfactual textual examples without requiring task labels or parallel data, and shows it improves sentiment and intent classifier accuracy in 20 and 78 setups, outperforming baselines and enhancing a state-of-the-art unsupervised DA algorithm.

Natural language processing (NLP) algorithms have become very successful, but they still struggle when applied to out-of-distribution examples. In this paper we propose a controllable generation approach in order to deal with this domain adaptation (DA) challenge. Given an input text example, our DoCoGen algorithm generates a domain-counterfactual textual example (D-con) - that is similar to the original in all aspects, including the task label, but its domain is changed to a desired one. Importantly, DoCoGen is trained using only unlabeled examples from multiple domains - no NLP task labels or parallel pairs of textual examples and their domain-counterfactuals are required. We show that DoCoGen can generate coherent counterfactuals consisting of multiple sentences. We use the D-cons generated by DoCoGen to augment a sentiment classifier and a multi-label intent classifier in 20 and 78 DA setups, respectively, where source-domain labeled data is scarce. Our model outperforms strong baselines and improves the accuracy of a state-of-the-art unsupervised DA algorithm.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes