CLAIJun 1, 2025

Un-considering Contextual Information: Assessing LLMs' Understanding of Indexical Elements

arXiv:2506.01089v11 citationsh-index: 6Has CodeACL
Originality Incremental advance
AI Analysis

This addresses the problem of assessing LLMs' understanding of linguistic indexicals for NLP researchers, providing a new benchmark dataset, but it is incremental as it extends existing coreference resolution evaluations to a specific linguistic category.

This study evaluated LLM performance on coreference resolution with indexicals like I, you, here, and tomorrow, revealing that LLMs perform well with some indexicals (e.g., I) but struggle with others (e.g., you, here, tomorrow), and that syntactic cues have mixed effects on performance.

Large Language Models (LLMs) have demonstrated impressive performances in tasks related to coreference resolution. However, previous studies mostly assessed LLM performance on coreference resolution with nouns and third person pronouns. This study evaluates LLM performance on coreference resolution with indexical like I, you, here and tomorrow, which come with unique challenges due to their linguistic properties. We present the first study examining how LLMs interpret indexicals in English, releasing the English Indexical Dataset with 1600 multiple-choice questions. We evaluate pioneering LLMs, including GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro, and DeepSeek V3. Our results reveal that LLMs exhibit an impressive performance with some indexicals (I), while struggling with others (you, here, tomorrow), and that syntactic cues (e.g. quotation) contribute to LLM performance with some indexicals, while they reduce performance with others. Code and data are available at: https://github.com/metehanoguzz/LLMs-Indexicals-English.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes