CLAIOct 23, 2023

Evaluating Spatial Understanding of Large Language Models

DeepMindStanford
arXiv:2310.14540v371 citationsh-index: 33
Originality Incremental advance
AI Analysis

This work addresses the problem of understanding grounded knowledge in LLMs for researchers and practitioners, but it is incremental as it builds on prior studies of implicit representations.

The paper tackled the problem of evaluating whether large language models (LLMs) implicitly capture spatial relationships, by designing natural-language navigation tasks and testing models like GPT-3.5-turbo, GPT-4, and Llama2 on various spatial structures such as grids, rings, and trees, revealing substantial variability in performance and errors due to both spatial and non-spatial factors.

Large language models (LLMs) show remarkable capabilities across a variety of tasks. Despite the models only seeing text in training, several recent studies suggest that LLM representations implicitly capture aspects of the underlying grounded concepts. Here, we explore LLM representations of a particularly salient kind of grounded knowledge -- spatial relationships. We design natural-language navigation tasks and evaluate the ability of LLMs, in particular GPT-3.5-turbo, GPT-4, and Llama2 series models, to represent and reason about spatial structures. These tasks reveal substantial variability in LLM performance across different spatial structures, including square, hexagonal, and triangular grids, rings, and trees. In extensive error analysis, we find that LLMs' mistakes reflect both spatial and non-spatial factors. These findings suggest that LLMs appear to capture certain aspects of spatial structure implicitly, but room for improvement remains.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes