The Position Curse: LLMs Struggle to Locate the Last Few Items in a List
Identifies a fundamental limitation in LLMs' positional reasoning that is critical for code understanding and editing in large codebases.
LLMs achieve near-perfect accuracy on 'needle in a haystack' tasks but fail to retrieve the last few items in short lists, a failure termed the Position Curse. For example, Claude Opus 4.6 misidentifies the second-to-last line in a two-line code snippet most of the time. LoRA fine-tuning on a position-focused dataset (PosBench) improves retrieval but performance remains far from saturated.
Modern large language models (LLMs) can find a needle in a haystack (locating a single relevant fact buried among hundreds of thousands of irrelevant tokens) with near-saturated accuracy, yet fail to retrieve the last few items in a short list. We call this failure the Position Curse. For instance, even in a two-line code snippet, Claude Opus 4.6 misidentifies the second-to-last line most of the time. To characterize this failure, we evaluated two complementary queries: given a position in a sequence (of letters or words), retrieve the corresponding item; and given an item, return its position. Each position is specified as a forward or backward offset from an anchor, either an endpoint of the list (its start or end) or another item in the list. Across both open-source and frontier closed-source models, backward retrieval substantially lags forward retrieval. To test whether this capability can be rescued by post-training, we constructed PosBench, a position-focused training dataset. LoRA fine-tuning improves both forward and backward retrieval and generalizes to a held-out code-understanding benchmark (PyIndex), yet absolute performance remains far from saturated. As LLM coding agents increasingly operate over large codebases where precise indexing becomes essential for code understanding and editing, position-based retrieval emerges as a key capability for future pretraining objectives and model design.