CLJul 29, 2025

AgriEval: A Comprehensive Chinese Agricultural Benchmark for Large Language Models

Lian Yan, Haotian Wang, Chen Tang, Haifeng Liu, Tianyang Sun, Liangliang Liu, Yi Guan, Jingchi Jiang

arXiv:2507.21773v14 citationsh-index: 16Has Code

Originality Synthesis-oriented

AI Analysis

This addresses the problem of evaluating and improving LLMs for agricultural applications, particularly in Chinese contexts, and is incremental as it introduces a new benchmark dataset.

The authors tackled the lack of training data and evaluation benchmarks for large language models (LLMs) in agriculture by proposing AgriEval, a comprehensive Chinese agricultural benchmark with 14,697 multiple-choice and 2,167 open-ended questions, and found that most existing LLMs struggle to achieve 60% accuracy.

In the agricultural domain, the deployment of large language models (LLMs) is hindered by the lack of training data and evaluation benchmarks. To mitigate this issue, we propose AgriEval, the first comprehensive Chinese agricultural benchmark with three main characteristics: (1) Comprehensive Capability Evaluation. AgriEval covers six major agriculture categories and 29 subcategories within agriculture, addressing four core cognitive scenarios: memorization, understanding, inference, and generation. (2) High-Quality Data. The dataset is curated from university-level examinations and assignments, providing a natural and robust benchmark for assessing the capacity of LLMs to apply knowledge and make expert-like decisions. (3) Diverse Formats and Extensive Scale. AgriEval comprises 14,697 multiple-choice questions and 2,167 open-ended question-and-answer questions, establishing it as the most extensive agricultural benchmark available to date. We also present comprehensive experimental results over 51 open-source and commercial LLMs. The experimental results reveal that most existing LLMs struggle to achieve 60% accuracy, underscoring the developmental potential in agricultural LLMs. Additionally, we conduct extensive experiments to investigate factors influencing model performance and propose strategies for enhancement. AgriEval is available at https://github.com/YanPioneer/AgriEval/.

View on arXiv PDF Code

Similar