EviLink: Multi-Path Schema Linking with Uncertainty-Guided Evidence Acquisition for Large-Scale Text-to-SQL
For large-scale Text-to-SQL systems, EviLink improves the balance between schema completeness, relevance, and token cost, addressing a key bottleneck in handling ambiguous databases.
EviLink reframes schema linking in Text-to-SQL as uncertainty-aware inference over multiple SQL paths, using multi-hypothesis grounding and evidence acquisition. On Spider2-Snow, it achieves 90.15% field-level strict recall with 123.30K average tokens, improving downstream SQL generation.
Schema linking is a difficult and important step in large-scale Text-to-SQL, where systems must identify a compact yet sufficient schema context from large and ambiguous databases. Existing methods often treat schema linking as deterministic selection around a single SQL path, but complex questions may admit multiple valid realizations with different schema needs. We reframe schema linking as uncertainty-aware schema-need inference over multiple plausible SQL paths, where the system distinguishes required schema items from path-dependent uncertain ones and acquires evidence only where needed. We instantiate this reframing with EviLink, which combines multi-hypothesis schema grounding with uncertainty-guided evidence acquisition. Experiments on BIRD-Dev and Spider2-Snow show that this perspective improves the balance among schema completeness, schema relevance, and token cost. On Spider2-Snow, EviLink achieves 90.15% field-level strict recall rate, uses 123.30K average tokens, and improves downstream SQL generation under a fixed generator.