CLOct 8, 2025

Agent Bain vs. Agent McKinsey: A New Text-to-SQL Benchmark for the Business Domain

arXiv:2510.07309v22 citationsh-index: 4
Originality Synthesis-oriented
AI Analysis

This addresses the need for real-world business intelligence by providing a domain-specific benchmark for evaluating text-to-SQL systems, though it is incremental as it builds on existing benchmarks.

The paper tackles the problem of text-to-SQL in business contexts by introducing CORGI, a new benchmark with synthetic databases and complex query categories, finding that LLM performance drops on high-level questions and that CORGI is about 21% more difficult than the BIRD benchmark.

In the business domain, where data-driven decision making is crucial, text-to-SQL is fundamental for easy natural language access to structured data. While recent LLMs have achieved strong performance in code generation, existing text-to-SQL benchmarks remain focused on factual retrieval of past records. We introduce CORGI, a new benchmark specifically designed for real-world business contexts. CORGI is composed of synthetic databases inspired by enterprises such as Doordash, Airbnb, and Lululemon. It provides questions across four increasingly complex categories of business queries: descriptive, explanatory, predictive, and recommendational. This challenge calls for causal reasoning, temporal forecasting, and strategic recommendation, reflecting multi-level and multi-step agentic intelligence. We find that LLM performance drops on high-level questions, struggling to make accurate predictions and offer actionable plans. Based on execution success rate, the CORGI benchmark is about 21% more difficult than the BIRD benchmark. This highlights the gap between popular LLMs and the need for real-world business intelligence. We release a public dataset and evaluation framework, and a website for public submissions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes