Patience is all you need! An agentic system for performing scientific literature review
This work addresses the challenge of automating nuanced scientific literature reviews for researchers, but it is incremental as it builds on existing LLM and retrieval methods.
The authors tackled the problem of using LLMs for scientific literature review by building a system that searches and distills information from full texts, demonstrating that sparse retrieval methods achieve near-state-of-the-art results on biology questions without the complexity of dense retrieval.
Large language models (LLMs) have grown in their usage to provide support for question answering across numerous disciplines. The models on their own have already shown promise for answering basic questions, however fail quickly where expert domain knowledge is required or the question is nuanced. Scientific research often involves searching for relevant literature, distilling pertinent information from that literature and analysing how the findings support or contradict one another. The information is often encapsulated in the full text body of research articles, rather than just in the abstracts. Statements within these articles frequently require the wider article context to be fully understood. We have built an LLM-based system that performs such search and distillation of information encapsulated in scientific literature, and we evaluate our keyword based search and information distillation system against a set of biology related questions from previously released literature benchmarks. We demonstrate sparse retrieval methods exhibit results close to state of the art without the need for dense retrieval, with its associated infrastructure and complexity overhead. We also show how to increase the coverage of relevant documents for literature review generation.