CVMay 21

AgroTools: A Benchmark for Tool-Augmented Multimodal Agents in Agriculture

arXiv:2605.2236681.2Has Code
AI Analysis

For researchers developing multimodal agents for precision agriculture, this benchmark provides a structured evaluation of tool-use capabilities, highlighting critical weaknesses in current models.

AgroTools introduces a benchmark for evaluating tool-augmented multimodal agents in agriculture, containing 539 QA instances with 1,097 images and 14 tools. Results show current models are far from reliable, with bottlenecks in tool planning, argument generation, execution recovery, and answer synthesis.

Agricultural decision-making increasingly requires multimodal systems that can transform visual observations into reliable, executable actions. However, existing agricultural multimodal benchmarks mainly evaluate final-answer correctness and provide limited support for assessing whether models can use external tools to complete precision-sensitive workflows. In this paper, we introduce AgroTools, a benchmark for evaluating tool-augmented multimodal agents in agriculture. AgroTools contains 539 question-answer instances paired with 1,097 heterogeneous agricultural images, spanning five task families and an executable environment of 14 agricultural tools. Each query is annotated with structured tool-use traces, enabling a dual-view evaluation of both process-level execution quality and outcome-level task success. We benchmark 9 open-source and 4 closed-source multimodal large language models on AgroTools. Results show that current models remain far from reliable in agricultural tool-use settings, with clear bottlenecks in tool planning, argument generation, execution recovery, and final-answer synthesis. We hope AgroTools will support future research on multimodal agents for high-precision agricultural applications. The benchmark and evaluation are available at https://huggingface.co/datasets/AgroTools/AgroTools.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes