AICLCVLGOct 25, 2025

DynaSolidGeo: A Dynamic Benchmark for Genuine Spatial Mathematical Reasoning of VLMs in Solid Geometry

arXiv:2510.22340v21 citationsh-index: 3Has Code
Originality Incremental advance
AI Analysis

This addresses the problem of evaluating spatial reasoning in VLMs for solid geometry, offering a novel benchmark to reduce data contamination and assess reasoning processes, though it is incremental in extending existing multimodal reasoning benchmarks.

The authors tackled the lack of benchmarks for genuine spatial mathematical reasoning in Vision-Language Models (VLMs) by introducing DynaSolidGeo, a dynamic benchmark with 503 seed questions that can generate diverse instances, revealing large performance gaps and severe degradation in dynamic settings across VLMs.

Solid geometry problem solving demands spatial mathematical reasoning that integrates spatial intelligence and symbolic reasoning. However, most existing multimodal mathematical reasoning benchmarks focus primarily on 2D plane geometry, rely on static datasets prone to data contamination and memorization, and evaluate models solely by final answers, overlooking the reasoning process. To address these limitations, we introduce DynaSolidGeo, the first dynamic benchmark for evaluating genuine spatial reasoning in Vision-Language Models (VLMs). Constructed through a semi-automatic annotation pipeline, DynaSolidGeo contains 503 expert-curated seed questions that can, in principle, dynamically generate an unbounded number of diverse multimodal text-visual instances. Beyond answer accuracy, we incorporate process evaluation based on expert-annotated reasoning chains to measure logical validity and causal coherence. Experiments across representative open-source and closed-source VLMs reveal large performance gaps, severe degradation in dynamic settings, and poor performance on tasks requiring high-level spatial intelligence, such as mental rotation and visualization. The code and dataset are available at \href{https://zgca-ai4edu.github.io/DynaSolidGeo/}{DynaSolidGeo}.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes