CVAIFeb 21

Synthesizing Multimodal Geometry Datasets from Scratch and Enabling Visual Alignment via Plotting Code

arXiv:2602.18745v1Has Code
Originality Incremental advance
AI Analysis

This addresses the problem of multimodal geometry reasoning for vision-language models, representing an incremental advance through dataset synthesis and alignment techniques.

The authors tackled the problem of limited training data and weak visual-symbolic alignment in multimodal geometry reasoning by creating GeoCode, a synthesized dataset with higher structural complexity and reasoning difficulty than existing benchmarks. Models trained on GeoCode achieved consistent improvements on multiple geometry benchmarks, demonstrating the dataset's effectiveness and the proposed alignment strategy.

Multimodal geometry reasoning requires models to jointly understand visual diagrams and perform structured symbolic inference, yet current vision--language models struggle with complex geometric constructions due to limited training data and weak visual--symbolic alignment. We propose a pipeline for synthesizing complex multimodal geometry problems from scratch and construct a dataset named \textbf{GeoCode}, which decouples problem generation into symbolic seed construction, grounded instantiation with verification, and code-based diagram rendering, ensuring consistency across structure, text, reasoning, and images. Leveraging the plotting code provided in GeoCode, we further introduce code prediction as an explicit alignment objective, transforming visual understanding into a supervised structured prediction task. GeoCode exhibits substantially higher structural complexity and reasoning difficulty than existing benchmarks, while maintaining mathematical correctness through multi-stage validation. Extensive experiments show that models trained on GeoCode achieve consistent improvements on multiple geometry benchmarks, demonstrating both the effectiveness of the dataset and the proposed alignment strategy. The code will be available at https://github.com/would1920/GeoCode.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes