CLLGJun 21, 2022

Plug and Play Counterfactual Text Generation for Model Robustness

arXiv:2206.10429v14 citationsh-index: 23
Originality Incremental advance
AI Analysis

This addresses the need for flexible, automated testing of NLP models to enhance reliability, though it builds incrementally on prior plug-and-play methods.

The paper tackles the problem of generating controlled counterfactual test cases for NLP model robustness by proposing CASPer, a plug-and-play framework that steers generation using attribute models without retraining, resulting in fluent, diverse text that improves model robustness through data augmentation.

Generating counterfactual test-cases is an important backbone for testing NLP models and making them as robust and reliable as traditional software. In generating the test-cases, a desired property is the ability to control the test-case generation in a flexible manner to test for a large variety of failure cases and to explain and repair them in a targeted manner. In this direction, significant progress has been made in the prior works by manually writing rules for generating controlled counterfactuals. However, this approach requires heavy manual supervision and lacks the flexibility to easily introduce new controls. Motivated by the impressive flexibility of the plug-and-play approach of PPLM, we propose bringing the framework of plug-and-play to counterfactual test case generation task. We introduce CASPer, a plug-and-play counterfactual generation framework to generate test cases that satisfy goal attributes on demand. Our plug-and-play model can steer the test case generation process given any attribute model without requiring attribute-specific training of the model. In experiments, we show that CASPer effectively generates counterfactual text that follow the steering provided by an attribute model while also being fluent, diverse and preserving the original content. We also show that the generated counterfactuals from CASPer can be used for augmenting the training data and thereby fixing and making the test model more robust.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes