CLOct 22, 2022

NeuroCounterfactuals: Beyond Minimal-Edit Counterfactuals for Richer Data Augmentation

Phillip Howard, Gadi Singer, Vasudev Lal, Yejin Choi, Swabha Swayamdipta

arXiv:2210.12365v124.6303 citationsh-index: 111Has Code

Originality Incremental advance

AI Analysis

This addresses the challenge of generating richer counterfactuals for data augmentation in NLP, offering potential for more robust generalization, though it appears incremental as it builds on existing counterfactual methods.

The paper tackled the problem of simplistic counterfactuals in data augmentation for NLP by introducing NeuroCounterfactuals, which allows larger edits for naturalistic generations, resulting in improved sentiment classification performance that outperforms manually curated counterfactuals in some settings.

While counterfactual data augmentation offers a promising step towards robust generalization in natural language processing, producing a set of counterfactuals that offer valuable inductive bias for models remains a challenge. Most existing approaches for producing counterfactuals, manual or automated, rely on small perturbations via minimal edits, resulting in simplistic changes. We introduce NeuroCounterfactuals, designed as loose counterfactuals, allowing for larger edits which result in naturalistic generations containing linguistic diversity, while still bearing similarity to the original document. Our novel generative approach bridges the benefits of constrained decoding, with those of language model adaptation for sentiment steering. Training data augmentation with our generations results in both in-domain and out-of-domain improvements for sentiment classification, outperforming even manually curated counterfactuals, under select settings. We further present detailed analyses to show the advantages of NeuroCounterfactuals over approaches involving simple, minimal edits.

View on arXiv PDF Code

Similar