AICGDec 18, 2025

Do Multi-Agents Solve Better Than Single? Evaluating Agentic Frameworks for Diagram-Grounded Geometry Problem Solving and Reasoning

arXiv:2512.16698v1h-index: 29Has Code
Originality Incremental advance
AI Analysis

This addresses the problem of optimizing agentic frameworks for multimodal reasoning in geometry, providing insights for researchers and practitioners, though it is incremental as it compares existing approaches.

The paper tackles the unclear benefits of multi-agent versus single-agent designs for diagram-grounded geometry problem solving, finding that multi-agent pipelines consistently improve performance for open-source models, with gains like +6.8 points for Qwen-2.5-VL (7B) on Geometry3K, but show mixed results for closed-source models.

Diagram-grounded geometry problem solving is a critical benchmark for multimodal large language models (MLLMs), yet the benefits of multi-agent design over single-agent remain unclear. We systematically compare single-agent and multi-agent pipelines on four visual math benchmarks: Geometry3K, MathVerse, OlympiadBench, and We-Math. For open-source models, multi-agent consistently improves performance. For example, Qwen-2.5-VL (7B) gains +6.8 points and Qwen-2.5-VL (32B) gains +3.3 on Geometry3K, and both Qwen-2.5-VL variants see further gains on OlympiadBench and We-Math. In contrast, the closed-source Gemini-2.0-Flash generally performs better in single-agent mode on classic benchmarks, while multi-agent yields only modest improvements on the newer We-Math dataset. These findings show that multi-agent pipelines provide clear benefits for open-source models and can assist strong proprietary systems on newer, less familiar benchmarks, but agentic decomposition is not universally optimal. All code, data, and reasoning files are available at https://github.com/faiyazabdullah/Interpreter-Solver

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes