CLCYNov 2, 2023

People Make Better Edits: Measuring the Efficacy of LLM-Generated Counterfactually Augmented Data for Harmful Language Detection

arXiv:2311.01270v3140 citationsh-index: 43
Originality Incremental advance
AI Analysis

This work addresses the need for efficient data augmentation to reduce spurious feature dependency in NLP models for social computing tasks, though it is incremental as it builds on existing CAD methods.

The study evaluated whether generative NLP models can automatically produce Counterfactually Augmented Data (CADs) to improve robustness in harmful language detection, finding that manual CADs are most effective but ChatGPT-generated CADs perform closely, with automated methods often failing to flip labels adequately.

NLP models are used in a variety of critical social computing tasks, such as detecting sexist, racist, or otherwise hateful content. Therefore, it is imperative that these models are robust to spurious features. Past work has attempted to tackle such spurious features using training data augmentation, including Counterfactually Augmented Data (CADs). CADs introduce minimal changes to existing training data points and flip their labels; training on them may reduce model dependency on spurious features. However, manually generating CADs can be time-consuming and expensive. Hence in this work, we assess if this task can be automated using generative NLP models. We automatically generate CADs using Polyjuice, ChatGPT, and Flan-T5, and evaluate their usefulness in improving model robustness compared to manually-generated CADs. By testing both model performance on multiple out-of-domain test sets and individual data point efficacy, our results show that while manual CADs are still the most effective, CADs generated by ChatGPT come a close second. One key reason for the lower performance of automated methods is that the changes they introduce are often insufficient to flip the original label.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes