Thinkquel: A Model Dedicated to Text-to-dbt Using Synthetic Data and a Span-Aware Objective
This addresses the problem of automating data transformations for data engineers and analysts, though it appears incremental as it builds on existing methods like reinforcement learning with novel adaptations.
The paper tackles the challenge of generating reliable, production-ready database queries from natural language by introducing Thinkquel, a fine-tuned model that achieves 93.2% execution success and 61.8% exact-result match on a test set, improving over the base model by 67.2% and 44.4% respectively.
Transforming natural-language requests into reliable, production-ready data transformations remains challenging: correctness depends on precise schema linking and warehouse-specific SQL dialects, while the strongest supervision available during training--execution success and result matching--are provided only at the sequence level. At the same time, assembling large, execution-validated corpora is costly, and token-level objectives misalign with these global signals, yielding unstable optimization and limited portability. We introduce Thinkquel, a fine-tuned model for producing robust, portable, and execution-validated database queries. Methodologies in Thinkquel integrates a novel synthetic data pipeline, TS-SQL, that leverages dbt as a portable intermediate representation with a span-aware reinforcement learning objective, and Token-Sequence GRPO (TS-GRPO), specifically designed to bridge the gap between token-level training signals and sequence-level execution rewards when finetuning LLMs. On the 500-example TS-SQL test set, Thinkquel (32B) reaches 93.2% execution success and 61.8% exact-result match with a two-stage SFT curriculum, improving over the base model by 67.2% (exec.) and 44.4% (match). In Spider (14B) experiments, TS-GRPO increases training stability and speeds convergence of the execution-match reward relative to GRPO and GSPO.