CL AI LGMar 25, 2025

Linguistic Blind Spots of Large Language Models

arXiv:2503.19260v113 citationsh-index: 9Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics

Originality Incremental advance

AI Analysis

This addresses reliability concerns for detailed linguistic analysis in AI applications, but it is incremental as it builds on existing critiques of LLM understanding.

The paper tackles the problem of large language models (LLMs) underperforming on fine-grained linguistic annotation tasks, such as detecting nouns, verbs, and clauses, and finds that even advanced models like Llama3-70b make notable errors like misidentifying embedded clauses and confusing complex nominals with clauses.

Large language models (LLMs) are the foundation of many AI applications today. However, despite their remarkable proficiency in generating coherent text, questions linger regarding their ability to perform fine-grained linguistic annotation tasks, such as detecting nouns or verbs, or identifying more complex syntactic structures like clauses in input texts. These tasks require precise syntactic and semantic understanding of input text, and when LLMs underperform on specific linguistic structures, it raises concerns about their reliability for detailed linguistic analysis and whether their (even correct) outputs truly reflect an understanding of the inputs. In this paper, we empirically study the performance of recent LLMs on fine-grained linguistic annotation tasks. Through a series of experiments, we find that recent LLMs show limited efficacy in addressing linguistic queries and often struggle with linguistically complex inputs. We show that the most capable LLM (Llama3-70b) makes notable errors in detecting linguistic structures, such as misidentifying embedded clauses, failing to recognize verb phrases, and confusing complex nominals with clauses. Our results provide insights to inform future advancements in LLM design and development.

View on arXiv PDF

Similar