Towards Large Reasoning Models for Agriculture
This work addresses the need for better AI tools in agriculture by providing benchmarks and datasets, though it is incremental as it builds on existing reasoning models.
The paper tackles the problem of agricultural decision-making by introducing AgReason, a benchmark for evaluating large reasoning models, and finds that these models outperform conventional ones with a top accuracy of 36%. It also presents AgThoughts, a dataset used to develop AgThinker, small models that improve reasoning abilities in LLMs for agriculture.
Agricultural decision-making involves complex, context-specific reasoning, where choices about crops, practices, and interventions depend heavily on geographic, climatic, and economic conditions. Traditional large language models (LLMs) often fall short in navigating this nuanced problem due to limited reasoning capacity. We hypothesize that recent advances in large reasoning models (LRMs) can better handle such structured, domain-specific inference. To investigate this, we introduce AgReason, the first expert-curated open-ended science benchmark with 100 questions for agricultural reasoning. Evaluations across thirteen open-source and proprietary models reveal that LRMs outperform conventional ones, though notable challenges persist, with the strongest Gemini-based baseline achieving 36% accuracy. We also present AgThoughts, a large-scale dataset of 44.6K question-answer pairs generated with human oversight and equipped with synthetically generated reasoning traces. Using AgThoughts, we develop AgThinker, a suite of small reasoning models that can be run on consumer-grade GPUs, and show that our dataset can be effective in unlocking agricultural reasoning abilities in LLMs. Our project page is here: https://baskargroup.github.io/Ag_reasoning/