CVSep 16, 2025

EvoEmpirBench: Dynamic Spatial Reasoning with Agent-ExpVer

Pukun Zhao, Longxiang Wang, Miaowei Wang, Chen Chen, Fanqing Zhou, Haojian Huang

arXiv:2509.12718v110.23 citationsh-index: 1Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses the need for better benchmarks in dynamic spatial reasoning for AI researchers, though it is incremental as it builds on existing benchmark methodologies.

The authors tackled the problem of evaluating spatial reasoning in dynamic, partially observable environments by introducing two benchmarks that test models' abilities in adaptive planning and memory utilization, revealing key limitations in mainstream models.

Most existing spatial reasoning benchmarks focus on static or globally observable environments, failing to capture the challenges of long-horizon reasoning and memory utilization under partial observability and dynamic changes. We introduce two dynamic spatial benchmarks, locally observable maze navigation and match-2 elimination that systematically evaluate models' abilities in spatial understanding and adaptive planning when local perception, environment feedback, and global objectives are tightly coupled. Each action triggers structural changes in the environment, requiring continuous update of cognition and strategy. We further propose a subjective experience-based memory mechanism for cross-task experience transfer and validation. Experiments show that our benchmarks reveal key limitations of mainstream models in dynamic spatial reasoning and long-term memory, providing a comprehensive platform for future methodological advances. Our code and data are available at https://anonymous.4open.science/r/EvoEmpirBench-143C/.

View on arXiv PDF

Similar