SpatialSim: Recognizing Spatial Configurations of Objects with Graph Neural Networks
This work addresses a fundamental challenge in spatial cognition for AI systems, with potential applications in robotics and computer vision, though it is incremental in advancing geometric reasoning benchmarks.
The paper tackled the problem of recognizing geometric spatial configurations of objects invariant to viewpoint by introducing SpatialSim, a novel benchmark with Identification and Comparison tasks, and demonstrated that Graph Neural Networks outperform less relational baselines like Deep Sets and Multi-Layer Perceptrons in these tasks.
Recognizing precise geometrical configurations of groups of objects is a key capability of human spatial cognition, yet little studied in the deep learning literature so far. In particular, a fundamental problem is how a machine can learn and compare classes of geometric spatial configurations that are invariant to the point of view of an external observer. In this paper we make two key contributions. First, we propose SpatialSim (Spatial Similarity), a novel geometrical reasoning benchmark, and argue that progress on this benchmark would pave the way towards a general solution to address this challenge in the real world. This benchmark is composed of two tasks: Identification and Comparison, each one instantiated in increasing levels of difficulty. Secondly, we study how relational inductive biases exhibited by fully-connected message-passing Graph Neural Networks (MPGNNs) are useful to solve those tasks, and show their advantages over less relational baselines such as Deep Sets and unstructured models such as Multi-Layer Perceptrons. Finally, we highlight the current limits of GNNs in these tasks.