CLMay 19, 2025

SeedBench: A Multi-task Benchmark for Evaluating Large Language Models in Seed Science

arXiv:2505.13220v18 citationsh-index: 3Has CodeACL
Originality Synthesis-oriented
AI Analysis

This addresses the problem of limited technological support and expert shortage in seed science for agriculture, though it is incremental as it focuses on benchmarking rather than novel model development.

The authors tackled the lack of standardized benchmarks for evaluating large language models (LLMs) in seed science by introducing SeedBench, a multi-task benchmark developed with domain experts, and found substantial gaps in performance across 26 leading LLMs.

Seed science is essential for modern agriculture, directly influencing crop yields and global food security. However, challenges such as interdisciplinary complexity and high costs with limited returns hinder progress, leading to a shortage of experts and insufficient technological support. While large language models (LLMs) have shown promise across various fields, their application in seed science remains limited due to the scarcity of digital resources, complex gene-trait relationships, and the lack of standardized benchmarks. To address this gap, we introduce SeedBench -- the first multi-task benchmark specifically designed for seed science. Developed in collaboration with domain experts, SeedBench focuses on seed breeding and simulates key aspects of modern breeding processes. We conduct a comprehensive evaluation of 26 leading LLMs, encompassing proprietary, open-source, and domain-specific fine-tuned models. Our findings not only highlight the substantial gaps between the power of LLMs and the real-world seed science problems, but also make a foundational step for research on LLMs for seed design.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes