Inherent limitations of LLMs regarding spatial information
This work addresses a critical problem for applications like autonomous vehicles and assistive technologies, but it is incremental as it focuses on evaluating existing models rather than proposing new solutions.
The paper investigates the limitations of large language models like ChatGPT in spatial reasoning and navigation tasks, such as 2D and 3D route planning, and introduces a novel evaluation framework with a baseline dataset to assess these capabilities, revealing key insights into the model's performance.
Despite the significant advancements in natural language processing capabilities demonstrated by large language models such as ChatGPT, their proficiency in comprehending and processing spatial information, especially within the domains of 2D and 3D route planning, remains notably underdeveloped. This paper investigates the inherent limitations of ChatGPT and similar models in spatial reasoning and navigation-related tasks, an area critical for applications ranging from autonomous vehicle guidance to assistive technologies for the visually impaired. In this paper, we introduce a novel evaluation framework complemented by a baseline dataset, meticulously crafted for this study. This dataset is structured around three key tasks: plotting spatial points, planning routes in two-dimensional (2D) spaces, and devising pathways in three-dimensional (3D) environments. We specifically developed this dataset to assess the spatial reasoning abilities of ChatGPT. Our evaluation reveals key insights into the model's capabilities and limitations in spatial understanding.