CLMay 28

Protocol for evaluating ChatGPT in biomedical association generation and verification using a RAG-enabled, cross-model majority voting workflow

arXiv:2605.3040079.81 citationsh-index: 10Has Code
Predicted impact top 72% in CL · last 90 daysOriginality Incremental advance
AI Analysis

This protocol addresses the problem of evaluating the reliability and accuracy of LLM-generated biomedical associations for researchers and practitioners in the biomedical domain.

This paper presents a protocol for evaluating ChatGPT's ability to generate and verify disease-centric biomedical associations. It outlines methods for generating associations, validating biological entities using ontologies, and verifying associations through literature, including a RAG-enabled, cross-model majority voting workflow to semantically verify content and expose hallucinations.

We present a protocol to evaluate ChatGPT's ability to generate disease-centric biomedical associations. It outlines how we generate the associations, validate the biological entities using biomedical ontologies, and verify associations using literature. The protocol includes a self-consistency strategy to assess generative reliability across ChatGPT models. To address ontology exact-match limitations, we provide a use case performing semantic verification through a workflow enabled by Retrieval-Augmented Generation (RAG) powered by open-source large language models (LLMs). This enables LLMs to establish truth over content generated by other LLMs and expose hallucination.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes