CLJun 24, 2024

Evaluating the Ability of Large Language Models to Reason about Cardinal Directions

arXiv:2406.16528v117 citations
Originality Synthesis-oriented
AI Analysis

This work addresses a specific reasoning challenge for AI researchers, but it is incremental as it applies existing evaluation methods to a new domain.

The paper tackled the problem of evaluating large language models' ability to reason about cardinal directions, finding that while they perform well on simple recall tasks, no model reliably determines correct directions in complex scenarios, even with zero temperature settings.

We investigate the abilities of a representative set of Large language Models (LLMs) to reason about cardinal directions (CDs). To do so, we create two datasets: the first, co-created with ChatGPT, focuses largely on recall of world knowledge about CDs; the second is generated from a set of templates, comprehensively testing an LLM's ability to determine the correct CD given a particular scenario. The templates allow for a number of degrees of variation such as means of locomotion of the agent involved, and whether set in the first , second or third person. Even with a temperature setting of zero, Our experiments show that although LLMs are able to perform well in the simpler dataset, in the second more complex dataset no LLM is able to reliably determine the correct CD, even with a temperature setting of zero.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes