CLMay 22, 2025

Continually Self-Improving Language Models for Bariatric Surgery Question--Answering

Yash Kumar Atri, Thomas H Shin, Thomas Hartvigsen

arXiv:2505.16102v24.91 citationsh-index: 8

Originality Incremental advance

AI Analysis

This addresses healthcare disparities in bariatric surgery care by improving patient access to reliable information, though it is incremental as it adapts existing RAG methods to a specific domain.

The paper tackled the problem of providing timely, evidence-based information for bariatric surgery patients by introducing bRAGgen, a self-updating RAG-based model that integrates real-time medical evidence, which demonstrated substantially superior performance in generating clinically accurate responses in evaluations against state-of-the-art models.

While bariatric and metabolic surgery (MBS) is considered the gold standard treatment for severe and morbid obesity, its therapeutic efficacy hinges upon active and longitudinal engagement with multidisciplinary providers, including surgeons, dietitians/nutritionists, psychologists, and endocrinologists. This engagement spans the entire patient journey, from preoperative preparation to long-term postoperative management. However, this process is often hindered by numerous healthcare disparities, such as logistical and access barriers, which impair easy patient access to timely, evidence-based, clinician-endorsed information. To address these gaps, we introduce bRAGgen, a novel adaptive retrieval-augmented generation (RAG)-based model that autonomously integrates real-time medical evidence when response confidence dips below dynamic thresholds. This self-updating architecture ensures that responses remain current and accurate, reducing the risk of misinformation. Additionally, we present bRAGq, a curated dataset of 1,302 bariatric surgery--related questions, validated by an expert bariatric surgeon. bRAGq constitutes the first large-scale, domain-specific benchmark for comprehensive MBS care. In a two-phase evaluation, bRAGgen is benchmarked against state-of-the-art models using both large language model (LLM)--based metrics and expert surgeon review. Across all evaluation dimensions, bRAGgen demonstrates substantially superior performance in generating clinically accurate and relevant responses.

View on arXiv PDF

Similar