CLJul 16, 2025

Evaluating the Ability of Large Language Models to Reason about Cardinal Directions, Revisited

arXiv:2507.12059v22 citationsh-index: 4
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of assessing spatial reasoning capabilities in AI models, but it is incremental as it extends prior research presented at COSIT-24.

The study evaluated 28 large language models on their ability to reason about cardinal directions using a benchmark with varied templates, finding that even newer models fail to reliably answer all questions correctly.

We investigate the abilities of 28 Large language Models (LLMs) to reason about cardinal directions (CDs) using a benchmark generated from a set of templates, extensively testing an LLM's ability to determine the correct CD given a particular scenario. The templates allow for a number of degrees of variation such as means of locomotion of the agent involved, and whether set in the first, second or third person. Even the newer Large Reasoning Models are unable to reliably determine the correct CD for all questions. This paper summarises and extends earlier work presented at COSIT-24.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes