CLCYJun 28, 2022

Flexible text generation for counterfactual fairness probing

arXiv:2206.13757v1632 citationsh-index: 45
Originality Incremental advance
AI Analysis

This work addresses fairness testing in text-based classifiers for researchers and practitioners, offering an incremental improvement over existing counterfactual generation methods.

The paper tackled the problem of generating complex counterfactuals for fairness testing in text classifiers by introducing a task that leverages large language models (LLMs), showing that this method produces more nuanced counterfactuals than existing wordlist or template-based approaches and demonstrating its value on the Civil Comments dataset for evaluating a toxicity classifier.

A common approach for testing fairness issues in text-based classifiers is through the use of counterfactuals: does the classifier output change if a sensitive attribute in the input is changed? Existing counterfactual generation methods typically rely on wordlists or templates, producing simple counterfactuals that don't take into account grammar, context, or subtle sensitive attribute references, and could miss issues that the wordlist creators had not considered. In this paper, we introduce a task for generating counterfactuals that overcomes these shortcomings, and demonstrate how large language models (LLMs) can be leveraged to make progress on this task. We show that this LLM-based method can produce complex counterfactuals that existing methods cannot, comparing the performance of various counterfactual generation methods on the Civil Comments dataset and showing their value in evaluating a toxicity classifier.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes