CLNov 7, 2022

Fixing Model Bugs with Natural Language Patches

Shikhar Murty, Christopher D. Manning, Scott Lundberg, Marco Tulio Ribeiro

MicrosoftStanfordUW

arXiv:2211.03318v224.9309 citationsh-index: 147Has Code

Originality Highly original

AI Analysis

This addresses the issue of brittle or labor-intensive model fixes for NLP practitioners, offering a more efficient and abstract method for error correction.

The paper tackles the problem of fixing systematic errors in NLP models by introducing natural language patches, which allow developers to provide corrective feedback declaratively, resulting in accuracy improvements of 1-4 points on sentiment analysis and a 7-point F1 gain on relation extraction with just 1 to 7 patches.

Current approaches for fixing systematic problems in NLP models (e.g. regex patches, finetuning on more data) are either brittle, or labor-intensive and liable to shortcuts. In contrast, humans often provide corrections to each other through natural language. Taking inspiration from this, we explore natural language patches -- declarative statements that allow developers to provide corrective feedback at the right level of abstraction, either overriding the model (``if a review gives 2 stars, the sentiment is negative'') or providing additional information the model may lack (``if something is described as the bomb, then it is good''). We model the task of determining if a patch applies separately from the task of integrating patch information, and show that with a small amount of synthetic data, we can teach models to effectively use real patches on real data -- 1 to 7 patches improve accuracy by ~1-4 accuracy points on different slices of a sentiment analysis dataset, and F1 by 7 points on a relation extraction dataset. Finally, we show that finetuning on as many as 100 labeled examples may be needed to match the performance of a small set of language patches.

View on arXiv PDF Code

Similar