CL AIJul 17, 2024

Explainable Biomedical Hypothesis Generation via Retrieval Augmented Generation enabled Large Language Models

Alexander R. Pelletier, Joseph Ramirez, Irsyad Adam, Simha Sankar, Yu Yan, Ding Wang, Dylan Steinecke, Wei Wang, Peipei Ping

arXiv:2407.12888v13.49 citationsh-index: 7

Originality Incremental advance

AI Analysis

This addresses the problem of hallucinatory responses in LLMs for biomedical researchers, offering a tool to improve hypothesis generation and therapeutic evaluation, though it appears incremental as it builds on existing RAG and LLM methods.

The authors tackled the challenge of biomedical information overload by developing RUGGED, a workflow that integrates retrieval-augmented generation with large language models to generate explainable hypotheses, demonstrated in a clinical use-case for evaluating therapeutics for Arrhythmogenic Cardiomyopathy and Dilated Cardiomyopathy.

The vast amount of biomedical information available today presents a significant challenge for investigators seeking to digest, process, and understand these findings effectively. Large Language Models (LLMs) have emerged as powerful tools to navigate this complex and challenging data landscape. However, LLMs may lead to hallucinatory responses, making Retrieval Augmented Generation (RAG) crucial for achieving accurate information. In this protocol, we present RUGGED (Retrieval Under Graph-Guided Explainable disease Distinction), a comprehensive workflow designed to support investigators with knowledge integration and hypothesis generation, identifying validated paths forward. Relevant biomedical information from publications and knowledge bases are reviewed, integrated, and extracted via text-mining association analysis and explainable graph prediction models on disease nodes, forecasting potential links among drugs and diseases. These analyses, along with biomedical texts, are integrated into a framework that facilitates user-directed mechanism elucidation as well as hypothesis exploration through RAG-enabled LLMs. A clinical use-case demonstrates RUGGED's ability to evaluate and recommend therapeutics for Arrhythmogenic Cardiomyopathy (ACM) and Dilated Cardiomyopathy (DCM), analyzing prescribed drugs for molecular interactions and unexplored uses. The platform minimizes LLM hallucinations, offers actionable insights, and improves the investigation of novel therapeutics.

View on arXiv PDF

Similar