CL AIAug 22, 2025

Compiling Prompts, Not Crafting Them: A Reproducible Workflow for AI-Assisted Evidence Synthesis

arXiv:2509.00038v16.72 citationsh-index: 27

Originality Synthesis-oriented

AI Analysis

This work addresses the need for more reliable and transparent AI-assisted evidence synthesis in scientific research, though it is an incremental application of existing methods to a specific domain.

The paper tackles the problem of unreliable and non-reproducible prompts in using large language models for systematic literature reviews by proposing a structured framework that adapts declarative prompt optimisation methods. It results in a reproducible workflow with automated prompt tuning and test suites, demonstrated through concrete code examples.

Large language models (LLMs) offer significant potential to accelerate systematic literature reviews (SLRs), yet current approaches often rely on brittle, manually crafted prompts that compromise reliability and reproducibility. This fragility undermines scientific confidence in LLM-assisted evidence synthesis. In response, this work adapts recent advances in declarative prompt optimisation, developed for general-purpose LLM applications, and demonstrates their applicability to the domain of SLR automation. This research proposes a structured, domain-specific framework that embeds task declarations, test suites, and automated prompt tuning into a reproducible SLR workflow. These emerging methods are translated into a concrete blueprint with working code examples, enabling researchers to construct verifiable LLM pipelines that align with established principles of transparency and rigour in evidence synthesis. This is a novel application of such approaches to SLR pipelines.

View on arXiv PDF

Similar