CLAIJul 15, 2025

Modeling Understanding of Story-Based Analogies Using Large Language Models

arXiv:2507.10957v12 citationsh-index: 4CogSci
Originality Incremental advance
AI Analysis

This work addresses the problem of assessing LLM reasoning capabilities for researchers in AI and cognitive science, but it is incremental as it builds on prior research.

The study evaluated how well large language models (LLMs) perform on story-based analogical mapping tasks compared to humans, finding that they can capture semantic similarities but lack robust human-like reasoning, with performance varying by model size and architecture.

Recent advancements in Large Language Models (LLMs) have brought them closer to matching human cognition across a variety of tasks. How well do these models align with human performance in detecting and mapping analogies? Prior research has shown that LLMs can extract similarities from analogy problems but lack robust human-like reasoning. Building on Webb, Holyoak, and Lu (2023), the current study focused on a story-based analogical mapping task and conducted a fine-grained evaluation of LLM reasoning abilities compared to human performance. First, it explored the semantic representation of analogies in LLMs, using sentence embeddings to assess whether they capture the similarity between the source and target texts of an analogy, and the dissimilarity between the source and distractor texts. Second, it investigated the effectiveness of explicitly prompting LLMs to explain analogies. Throughout, we examine whether LLMs exhibit similar performance profiles to those observed in humans by evaluating their reasoning at the level of individual analogies, and not just at the level of overall accuracy (as prior studies have done). Our experiments include evaluating the impact of model size (8B vs. 70B parameters) and performance variation across state-of-the-art model architectures such as GPT-4 and LLaMA3. This work advances our understanding of the analogical reasoning abilities of LLMs and their potential as models of human reasoning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes