Hasan Slika

31.3AIApr 16

DeepER-Med: Advancing Deep Evidence-Based Research in Medicine Through Agentic AI

Zhizheng Wang, Chih-Hsuan Wei, Joey Chan et al.

Trustworthiness and transparency are essential for the clinical adoption of artificial intelligence (AI) in healthcare and biomedical research. Recent deep research systems aim to accelerate evidence-grounded scientific discovery by integrating AI agents with multi-hop information retrieval, reasoning, and synthesis. However, most existing systems lack explicit and inspectable criteria for evidence appraisal, creating a risk of compounding errors and making it difficult for researchers and clinicians to assess the reliability of their outputs. In parallel, current benchmarking approaches rarely evaluate performance on complex, real-world medical questions. Here, we introduce DeepER-Med, a Deep Evidence-based Research framework for Medicine with an agentic AI system. DeepER-Med frames deep medical research as an explicit and inspectable workflow of evidence-based generation, consisting of three modules: research planning, agentic collaboration, and evidence synthesis. To support realistic evaluation, we also present DeepER-MedQA, an evidence-grounded dataset comprising 100 expert-level research questions derived from authentic medical research scenarios and curated by a multidisciplinary panel of 11 biomedical experts. Expert manual evaluation demonstrates that DeepER-Med consistently outperforms widely used production-grade platforms across multiple criteria, including the generation of novel scientific insights. We further demonstrate the practical utility of DeepER-Med through eight real-world clinical cases. Human clinician assessment indicates that DeepER-Med's conclusions align with clinical recommendations in seven cases, highlighting its potential for medical research and decision support.

15.1AIApr 17

Large Language Models Meet Biomedical Knowledge Graphs for Mechanistically Grounded Therapeutic Prioritization

Chih-Hsuan Wei, Chi-Ping Day, Zhizheng Wang et al.

Drug repurposing is often framed as a candidate identification task, but existing approaches provide limited guidance for distinguishing biologically plausible candidates from historically well-connected ones. Here we introduce DrugKLM, a hybrid framework that integrates biomedical knowledge graph structure with large language model-based mechanistic reasoning to enable mechanistically grounded therapeutic prioritization. Across benchmark datasets, DrugKLM outperforms knowledge graph-only and language model-only baselines, including TxGNN. Beyond improved recall, DrugKLM confidence scores exhibit functional alignment with molecular phenotypes: higher scores are associated with transcriptional signatures linked to improved survival across 12 TCGA cancers. The scoring framework preferentially captures biologically perturbational signals rather than historical indication patterns. Expert curation across five cancers further reveals systematic differences in prioritization behavior, with DrugKLM elevating candidates supported by coherent mechanistic rationale and disease-specific clinical context. Together, these results establish DrugKLM as an evidence-integrative framework that translates heterogeneous biomedical data into mechanistically interpretable and clinically grounded therapeutic hypotheses.

Hasan Slika

2 Papers