CLDec 21, 2022

Contrastive Error Attribution for Finetuned Language Models

Faisal Ladhak, Esin Durmus, Tatsunori Hashimoto

Stanford

arXiv:2212.10722v221.8226 citationsh-index: 33Has Code

Originality Incremental advance

AI Analysis

This addresses the challenge of creating reliable NLG systems by improving data quality, though it is incremental as it builds on existing error detection methods.

The paper tackles the problem of identifying and removing noisy or misannotated training data that causes hallucinations and unfaithful outputs in NLG tasks, resulting in a 70% reduction in entity hallucinations on the NYT dataset and a 55% reduction in semantic errors on the E2E dataset.

Recent work has identified noisy and misannotated data as a core cause of hallucinations and unfaithful outputs in Natural Language Generation (NLG) tasks. Consequently, identifying and removing these examples is a key open challenge in creating reliable NLG systems. In this work, we introduce a framework to identify and remove low-quality training instances that lead to undesirable outputs, such as faithfulness errors in text summarization. We show that existing approaches for error tracing, such as gradient-based influence measures, do not perform reliably for detecting faithfulness errors in NLG datasets. We overcome the drawbacks of existing error tracing methods through a new, contrast-based estimate that compares undesired generations to human-corrected outputs. Our proposed method can achieve a mean average precision of 0.93 at detecting known data errors across synthetic tasks with known ground truth, substantially outperforming existing approaches. Using this approach and re-training models on cleaned data leads to a 70% reduction in entity hallucinations on the NYT dataset and a 55% reduction in semantic errors on the E2E dataset.

View on arXiv PDF Code

Similar