CLMay 28, 2025

RAGPPI: RAG Benchmark for Protein-Protein Interactions in Drug Discovery

arXiv:2505.23823v11 citationsh-index: 4
Originality Synthesis-oriented
AI Analysis

This provides a resource for advancing RAG systems in drug discovery, addressing a specific gap in target identification, but it is incremental as it focuses on creating a benchmark rather than a new method.

The authors tackled the lack of a benchmark for retrieving biological impacts of protein-protein interactions in drug discovery by introducing RAGPPI, a factual question-answer benchmark with 4,420 pairs, including a gold-standard dataset of 500 pairs and a silver-standard dataset of 3,720 pairs.

Retrieving the biological impacts of protein-protein interactions (PPIs) is essential for target identification (Target ID) in drug development. Given the vast number of proteins involved, this process remains time-consuming and challenging. Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) frameworks have supported Target ID; however, no benchmark currently exists for identifying the biological impacts of PPIs. To bridge this gap, we introduce the RAG Benchmark for PPIs (RAGPPI), a factual question-answer benchmark of 4,420 question-answer pairs that focus on the potential biological impacts of PPIs. Through interviews with experts, we identified criteria for a benchmark dataset, such as a type of QA and source. We built a gold-standard dataset (500 QA pairs) through expert-driven data annotation. We developed an ensemble auto-evaluation LLM that reflected expert labeling characteristics, which facilitates the construction of a silver-standard dataset (3,720 QA pairs). We are committed to maintaining RAGPPI as a resource to support the research community in advancing RAG systems for drug discovery QA solutions.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes