CL AIJan 31, 2023

Large Language Models Can Be Easily Distracted by Irrelevant Context

Freda Shi, Xinyun Chen, Kanishka Misra, Nathan Scales, David Dohan, Ed Chi, Nathanael Schärli, Denny Zhou

DeepMind

arXiv:2302.00093v339.01063 citationsh-index: 78Has Code

Originality Incremental advance

AI Analysis

This addresses a critical reliability issue for users of large language models in real-world applications where input may contain noise.

The authors investigated how large language models' problem-solving accuracy is affected by irrelevant context, finding that performance dramatically decreases when such information is included, as shown on the GSM-IC dataset.

Large language models have achieved impressive performance on various natural language processing tasks. However, so far they have been evaluated primarily on benchmarks where all information in the input context is relevant for solving the task. In this work, we investigate the distractibility of large language models, i.e., how the model problem-solving accuracy can be influenced by irrelevant context. In particular, we introduce Grade-School Math with Irrelevant Context (GSM-IC), an arithmetic reasoning dataset with irrelevant information in the problem description. We use this benchmark to measure the distractibility of cutting-edge prompting techniques for large language models, and find that the model performance is dramatically decreased when irrelevant information is included. We also identify several approaches for mitigating this deficiency, such as decoding with self-consistency and adding to the prompt an instruction that tells the language model to ignore the irrelevant information.

View on arXiv PDF Code

Similar