Evaluating LLM-Based Grant Proposal Review via Structured Perturbations

William Thorne, Joseph James, Yang Wang, Chenghua Lin, Diana Maynard

arXiv:2603.0828114.51 citationsh-index: 3

AI Analysis

This addresses the problem of AI-assisted grant proposals overwhelming manual review capacity for research funding organizations, though it represents an incremental evaluation study.

This paper investigated LLM-based grant proposal reviewing using a perturbation framework on six EPSRC proposals, finding that section-by-section analysis significantly outperformed other architectures in detection rate and scoring reliability, while computationally expensive ensemble methods performed no better than baseline.

As AI-assisted grant proposals outpace manual review capacity in a kind of ``Malthusian trap'' for the research ecosystem, this paper investigates the capabilities and limitations of LLM-based grant reviewing for high-stakes evaluation. Using six EPSRC proposals, we develop a perturbation-based framework probing LLM sensitivity across six quality axes: funding, timeline, competency, alignment, clarity, and impact. We compare three review architectures: single-pass review, section-by-section analysis, and a 'Council of Personas' ensemble emulating expert panels. The section-level approach significantly outperforms alternatives in both detection rate and scoring reliability, while the computationally expensive council method performs no better than baseline. Detection varies substantially by perturbation type, with alignment issues readily identified but clarity flaws largely missed by all systems. Human evaluation shows LLM feedback is largely valid but skewed toward compliance checking over holistic assessment. We conclude that current LLMs may provide supplementary value within EPSRC review but exhibit high variability and misaligned review priorities. We release our code and any non-protected data.

View on arXiv PDF

Similar