AI CLMay 22, 2025

SPaRC: A Spatial Pathfinding Reasoning Challenge

Lars Benedikt Kaesberg, Jan Philip Wahle, Terry Ruas, Bela Gipp

arXiv:2505.16686v212.46 citationsh-index: 14Has CodeEMNLP

Originality Synthesis-oriented

AI Analysis

This addresses the problem of evaluating and improving spatial reasoning in AI models, though it is incremental as it focuses on a specific domain challenge.

The authors tackled the lack of datasets for testing abstract, multi-step spatial reasoning by introducing SPaRC, a dataset of 1,000 2D grid pathfinding puzzles, where humans achieve 98.0% accuracy but the best models like o4-mini only reach 15.8%.

Existing reasoning datasets saturate and fail to test abstract, multi-step problems, especially pathfinding and complex rule constraint satisfaction. We introduce SPaRC (Spatial Pathfinding Reasoning Challenge), a dataset of 1,000 2D grid pathfinding puzzles to evaluate spatial and symbolic reasoning, requiring step-by-step planning with arithmetic and geometric rules. Humans achieve near-perfect accuracy (98.0%; 94.5% on hard puzzles), while the best reasoning models, such as o4-mini, struggle (15.8%; 1.1% on hard puzzles). Models often generate invalid paths (>50% of puzzles for o4-mini), and reasoning tokens reveal they make errors in navigation and spatial logic. Unlike humans, who take longer on hard puzzles, models fail to scale test-time compute with difficulty. Allowing models to make multiple solution attempts improves accuracy, suggesting potential for better spatial reasoning with improved training and efficient test-time scaling methods. SPaRC can be used as a window into models' spatial reasoning limitations and drive research toward new methods that excel in abstract, multi-step problem-solving.

View on arXiv PDF Code

Similar