CVLGMay 20, 2025

Plane Geometry Problem Solving with Multi-modal Reasoning: A Survey

arXiv:2505.14340v13 citationsh-index: 7
Originality Synthesis-oriented
AI Analysis

This survey provides a systematic synthesis for researchers in multi-modal reasoning, but it is incremental as it reviews existing work without introducing new methods or results.

The authors conducted a survey to address the lack of a comprehensive overview in plane geometry problem solving (PGPS), categorizing methods into an encoder-decoder framework and analyzing architectural designs, challenges, and future directions.

Plane geometry problem solving (PGPS) has recently gained significant attention as a benchmark to assess the multi-modal reasoning capabilities of large vision-language models. Despite the growing interest in PGPS, the research community still lacks a comprehensive overview that systematically synthesizes recent work in PGPS. To fill this gap, we present a survey of existing PGPS studies. We first categorize PGPS methods into an encoder-decoder framework and summarize the corresponding output formats used by their encoders and decoders. Subsequently, we classify and analyze these encoders and decoders according to their architectural designs. Finally, we outline major challenges and promising directions for future research. In particular, we discuss the hallucination issues arising during the encoding phase within encoder-decoder architectures, as well as the problem of data leakage in current PGPS benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes