CBR-to-SQL: Rethinking Retrieval-based Text-to-SQL using Case-based Reasoning in the Healthcare Domain
For healthcare researchers and clinicians needing to query EHR databases without SQL expertise, CBR-to-SQL offers a more robust and sample-efficient alternative to standard RAG, though improvements are incremental.
CBR-to-SQL decomposes retrieval-augmented generation for text-to-SQL into two stages—structural and entity alignment—achieving competitive accuracy on clinical benchmarks with higher sample efficiency and robustness than standard RAG, especially under data scarcity.
Extracting insights from Electronic Health Record (EHR) databases often requires SQL expertise, creating a barrier for clinical decision-making and research. A promising approach is to use Large Language Models (LLMs) to translate natural language questions into SQL through Retrieval-Augmented Generation (RAG), where relevant question-SQL examples are retrieved to generate new queries via few-shot learning. However, adapting this method to the medical domain is non-trivial, as effective retrieval requires examples that align with both the logical structure of the question and its referenced entities (e.g., drug names, procedure titles). Standard single-step RAG struggles to optimize both aspects simultaneously and often relies on near-exact matches to generalize effectively. This issue is especially severe in healthcare, as questions often contain noisy and inconsistent medical jargon. To address this, we present CBR-to-SQL, a framework inspired by Case-based Reasoning theory that decomposes RAG's single-step retrieval into two explicit stages: one that focuses on retrieving structurally relevant examples, and one that aligns entities with the target database schema. Evaluated on two clinical benchmarks, CBR-to-SQL achieves competitive accuracies compared to fine-tuned methods. More importantly, it demonstrates considerably higher sample efficiency and robustness than the standard RAG approach, particularly under data scarcity and retrieval perturbations.