Lost in the Prompt Order: Revealing the Limitations of Causal Attention in Language Models
This reveals a critical limitation in causal attention mechanisms for language models, impacting their reliability in tasks like question answering.
The paper tackled the problem of large language models' sensitivity to prompt structure, specifically showing that placing context before questions and options (CQO) outperforms the reverse order (QOC) by over 14% in multiple-choice question answering across various models and datasets.
Large language models exhibit surprising sensitivity to the structure of the prompt, but the mechanisms underlying this sensitivity remain poorly understood. In this work, we conduct an in-depth investigation on a striking case: in multiple-choice question answering, placing context before the questions and options (CQO) outperforms the reverse order (QOC) by over 14%p, consistently over a wide range of models and datasets. Through systematic architectural analysis, we identify causal attention as the core mechanism: in QOC prompts, the causal mask prevents option tokens from attending to context, creating an information bottleneck where context becomes invisible to options.