Feature Representation Learning for NL2SQL Generation Based on Coupling and Decoupling
This work addresses a specific bottleneck in NL2SQL generation for database query systems, representing an incremental advancement over existing methods.
The paper tackled the NL2SQL task by addressing overlooked correlation features between SELECT and WHERE clauses and within sub-tasks, proposing the CFCDC model that uses decoupling and coupling methods, which achieved significant improvements in logic precision and execution accuracy on the WikiSQL dataset.
The NL2SQL task involves parsing natural language statements into SQL queries. While most state-of-the-art methods treat NL2SQL as a slot-filling task and use feature representation learning techniques, they overlook explicit correlation features between the SELECT and WHERE clauses and implicit correlation features between sub-tasks within a single clause. To address this issue, we propose the Clause Feature Correlation Decoupling and Coupling (CFCDC) model, which uses a feature representation decoupling method to separate the SELECT and WHERE clauses at the parameter level. Next, we introduce a multi-task learning architecture to decouple implicit correlation feature representation between different SQL tasks in a specific clause. Moreover, we present an improved feature representation coupling module to integrate the decoupled tasks in the SELECT and WHERE clauses and predict the final SQL query. Our proposed CFCDC model demonstrates excellent performance on the WikiSQL dataset, with significant improvements in logic precision and execution accuracy. The source code for the model will be publicly available on GitHub