AI CGDec 18, 2025

Do Multi-Agents Solve Better Than Single? Evaluating Agentic Frameworks for Diagram-Grounded Geometry Problem Solving and Reasoning

Mahbub E Sobhani, Md. Faiyaz Abdullah Sayeedi, Mohammad Nehad Alam, Proma Hossain Progga, Swakkhar Shatabda

arXiv:2512.16698v1h-index: 29Has Code

Originality Incremental advance

AI Analysis

This addresses the problem of optimizing agentic frameworks for multimodal reasoning in geometry, providing insights for researchers and practitioners, though it is incremental as it compares existing approaches.

The paper tackles the unclear benefits of multi-agent versus single-agent designs for diagram-grounded geometry problem solving, finding that multi-agent pipelines consistently improve performance for open-source models, with gains like +6.8 points for Qwen-2.5-VL (7B) on Geometry3K, but show mixed results for closed-source models.

Diagram-grounded geometry problem solving is a critical benchmark for multimodal large language models (MLLMs), yet the benefits of multi-agent design over single-agent remain unclear. We systematically compare single-agent and multi-agent pipelines on four visual math benchmarks: Geometry3K, MathVerse, OlympiadBench, and We-Math. For open-source models, multi-agent consistently improves performance. For example, Qwen-2.5-VL (7B) gains +6.8 points and Qwen-2.5-VL (32B) gains +3.3 on Geometry3K, and both Qwen-2.5-VL variants see further gains on OlympiadBench and We-Math. In contrast, the closed-source Gemini-2.0-Flash generally performs better in single-agent mode on classic benchmarks, while multi-agent yields only modest improvements on the newer We-Math dataset. These findings show that multi-agent pipelines provide clear benefits for open-source models and can assist strong proprietary systems on newer, less familiar benchmarks, but agentic decomposition is not universally optimal. All code, data, and reasoning files are available at https://github.com/faiyazabdullah/Interpreter-Solver

View on arXiv PDF Code

Similar