CLAIFeb 13, 2025

When the LM misunderstood the human chuckled: Analyzing garden path effects in humans and language models

DeepMind
arXiv:2502.09307v14 citationsh-index: 59ACL
Originality Incremental advance
AI Analysis

This research addresses the problem of understanding the limitations of large language models in sentence comprehension, which is significant for natural language processing applications.

This study compared the ability of large language models (LLMs) and humans to comprehend garden-path sentences, finding that both struggle with specific syntactic complexities, with some models showing high correlation with human comprehension. The results were validated through additional tasks such as paraphrasing and text-to-image generation.

Modern Large Language Models (LLMs) have shown human-like abilities in many language tasks, sparking interest in comparing LLMs' and humans' language processing. In this paper, we conduct a detailed comparison of the two on a sentence comprehension task using garden-path constructions, which are notoriously challenging for humans. Based on psycholinguistic research, we formulate hypotheses on why garden-path sentences are hard, and test these hypotheses on human participants and a large suite of LLMs using comprehension questions. Our findings reveal that both LLMs and humans struggle with specific syntactic complexities, with some models showing high correlation with human comprehension. To complement our findings, we test LLM comprehension of garden-path constructions with paraphrasing and text-to-image generation tasks, and find that the results mirror the sentence comprehension question results, further validating our findings on LLM understanding of these constructions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes