Extending AI for Research to the Humanities: A Multi-Agent Framework for Evidence-Grounded Scholarship
This work addresses the critical need for AI tools capable of interpretive, evidence-grounded reasoning over primary sources for humanities scholars, where existing methods fall short.
This paper introduces SPIRE, a multi-agent framework designed for evidence-grounded humanities scholarship, which addresses the limitations of existing LLM-based research agents optimized for science and engineering. SPIRE significantly outperforms Naive LLM, Text RAG, and GraphRAG in recovering cited primary-source evidence and achieves higher blind-judge scores for accuracy, depth, coverage, and evidence quality on a benchmark of classical Chinese and Greco-Roman Latin scholarship.
LLM-based research agents have advanced rapidly in science and engineering, where research is organized around executable experiments, code, and quantitative signals. Humanities scholarship, however, requires a different mode of reasoning: interpretive, evidence-grounded argument over primary sources, where scholarly value depends on faithful quotation, verifiable provenance, and close reading. Existing research agents remain largely optimized for execution and retrieval, not evidence-grounded interpretive reasoning. To address this gap, we introduce SPIRE (Scholarly-Primitives-Inspired Research Engine), a multi-agent framework for evidence-grounded humanities scholarship. Drawing on Scholarly Primitives theory, SPIRE casts recurring humanities operations as cooperating agent roles (source discovery, evidence annotation, comparison, provenance checking, sampling, citation binding, and argumentative synthesis) over a multi-scale close-reading substrate of passages, intra-context graph communities, and cross-context semantic clusters. On a peer-reviewed-paper benchmark over classical Chinese and Greco-Roman Latin scholarship, SPIRE recovers cited primary-source evidence more reliably than Naive LLM, Text RAG, and GraphRAG, and receives higher blind-judge scores on answer accuracy, depth, coverage, and evidence quality. Ablations show that both the scholarly-operation agents and close-reading retrieval contribute to evidence-grounded essays. Code, data catalogues, and reproduction scripts are released at https://github.com/YatingPan/SPIRE.