LGCLJan 16, 2025

Confidence Estimation for Error Detection in Text-to-SQL Systems

arXiv:2501.09527v111 citationsh-index: 5AAAI
Originality Incremental advance
AI Analysis

This work addresses the challenge of ensuring interpretative confidence for users of text-to-SQL systems, though it is incremental as it builds on existing selective classification and calibration methods.

The study tackled the problem of unreliable confidence estimates in text-to-SQL systems by integrating selective classifiers with entropy-based confidence estimation and calibration techniques, finding that encoder-decoder T5 models are better calibrated than GPT-4 and Llama 3, leading to improved error detection performance.

Text-to-SQL enables users to interact with databases through natural language, simplifying the retrieval and synthesis of information. Despite the success of large language models (LLMs) in converting natural language questions into SQL queries, their broader adoption is limited by two main challenges: achieving robust generalization across diverse queries and ensuring interpretative confidence in their predictions. To tackle these issues, our research investigates the integration of selective classifiers into Text-to-SQL systems. We analyse the trade-off between coverage and risk using entropy based confidence estimation with selective classifiers and assess its impact on the overall performance of Text-to-SQL models. Additionally, we explore the models' initial calibration and improve it with calibration techniques for better model alignment between confidence and accuracy. Our experimental results show that encoder-decoder T5 is better calibrated than in-context-learning GPT 4 and decoder-only Llama 3, thus the designated external entropy-based selective classifier has better performance. The study also reveal that, in terms of error detection, selective classifier with a higher probability detects errors associated with irrelevant questions rather than incorrect query generations.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes