BiomedSQL: Text-to-SQL for Scientific Reasoning on Biomedical Knowledge Bases
This addresses the need for better text-to-SQL systems to support biomedical researchers by providing a new benchmark, though it is incremental as it focuses on evaluation rather than a novel method.
The paper tackles the problem of text-to-SQL systems struggling with scientific reasoning in biomedical databases by introducing BiomedSQL, a benchmark with 68,000 question/SQL/answer triples, where models like GPT-o3-mini and BMSQL achieved 59.0% and 62.6% execution accuracy, respectively, well below an expert baseline of 90.0%.
Biomedical researchers increasingly rely on large-scale structured databases for complex analytical tasks. However, current text-to-SQL systems often struggle to map qualitative scientific questions into executable SQL, particularly when implicit domain reasoning is required. We introduce BiomedSQL, the first benchmark explicitly designed to evaluate scientific reasoning in text-to-SQL generation over a real-world biomedical knowledge base. BiomedSQL comprises 68,000 question/SQL query/answer triples generated from templates and grounded in a harmonized BigQuery knowledge base that integrates gene-disease associations, causal inference from omics data, and drug approval records. Each question requires models to infer domain-specific criteria, such as genome-wide significance thresholds, effect directionality, or trial phase filtering, rather than rely on syntactic translation alone. We evaluate a range of open- and closed-source LLMs across prompting strategies and interaction paradigms. Our results reveal a substantial performance gap: GPT-o3-mini achieves 59.0% execution accuracy, while our custom multi-step agent, BMSQL, reaches 62.6%, both well below the expert baseline of 90.0%. BiomedSQL provides a new foundation for advancing text-to-SQL systems capable of supporting scientific discovery through robust reasoning over structured biomedical knowledge bases. Our dataset is publicly available at https://huggingface.co/datasets/NIH-CARD/BiomedSQL, and our code is open-source at https://github.com/NIH-CARD/biomedsql.