Polygon: Symbolic Reasoning for SQL using Conflict-Driven Under-Approximation Search
This addresses the problem of verifying and debugging SQL queries for database developers and researchers, representing a novel method rather than an incremental improvement.
The paper tackles the problem of efficiently generating inputs for SQL queries to satisfy given properties, such as disproving equivalence or disambiguating queries, using a symbolic reasoning engine called Polygon. The result shows that Polygon significantly outperforms prior techniques, as evaluated on over 30,000 benchmarks across two tasks.
We present a novel symbolic reasoning engine for SQL which can efficiently generate an input $I$ for $n$ queries $P_1, \cdots, P_n$, such that their outputs on $I$ satisfy a given property (expressed in SMT). This is useful in different contexts, such as disproving equivalence of two SQL queries and disambiguating a set of queries. Our first idea is to reason about an under-approximation of each $P_i$ -- that is, a subset of $P_i$'s input-output behaviors. While it makes our approach both semantics-aware and lightweight, this idea alone is incomplete (as a fixed under-approximation might miss some behaviors of interest). Therefore, our second idea is to perform search over an expressive family of under-approximations (which collectively cover all program behaviors of interest), thereby making our approach complete. We have implemented these ideas in a tool, Polygon, and evaluated it on over 30,000 benchmarks across two tasks (namely, SQL equivalence refutation and query disambiguation). Our evaluation results show that Polygon significantly outperforms all prior techniques.