Knowledge Synthesis of Photosynthesis Research Using a Large Language Model
This addresses the problem of inaccurate scientific contexts in AI tools for plant science researchers, offering an incremental improvement through enhanced retrieval and optimization techniques.
The study tackled the challenge of using large language models for complex biological data in photosynthesis research by developing a photosynthesis research assistant (PRAG) based on GPT-4o with retrieval-augmented generation and prompt optimization, resulting in an average 8.7% improvement across scientific writing metrics and up to 63% entity matching with database papers.
The development of biological data analysis tools and large language models (LLMs) has opened up new possibilities for utilizing AI in plant science research, with the potential to contribute significantly to knowledge integration and research gap identification. Nonetheless, current LLMs struggle to handle complex biological data and theoretical models in photosynthesis research and often fail to provide accurate scientific contexts. Therefore, this study proposed a photosynthesis research assistant (PRAG) based on OpenAI's GPT-4o with retrieval-augmented generation (RAG) techniques and prompt optimization. Vector databases and an automated feedback loop were used in the prompt optimization process to enhance the accuracy and relevance of the responses to photosynthesis-related queries. PRAG showed an average improvement of 8.7% across five metrics related to scientific writing, with a 25.4% increase in source transparency. Additionally, its scientific depth and domain coverage were comparable to those of photosynthesis research papers. A knowledge graph was used to structure PRAG's responses with papers within and outside the database, which allowed PRAG to match key entities with 63% and 39.5% of the database and test papers, respectively. PRAG can be applied for photosynthesis research and broader plant science domains, paving the way for more in-depth data analysis and predictive capabilities.