Zhao Tan

CL
h-index8
6papers
40citations
Novelty65%
AI Score47

6 Papers

CLApr 23, 2023
Divide and Prompt: Chain of Thought Prompting for Text-to-SQL

Xiping Liu, Zhao Tan

Chain-of-thought (CoT) prompting combined with large language models (LLMs) have achieved encouraging results on complex reasoning tasks. Text-to-SQL is a critical semantic parsing task that converts natural language questions into SQL statements, involving a complex reasoning process. However, there is little work about using CoT prompting to activate LLM's reasoning capabilities on Text-to-SQL tasks. In this work, we propose a new paradigm for prompting Text-to-SQL tasks, called Divide-and-Prompt, which first divides the task into subtasks, and then approach each subtask through CoT. We present 3 prompting-based methods to enhance the Text-to-SQL ability of LLMs. Experiments show that these prompts guide LLMs to generate Text-to-SQL with higher execution accuracy.

CLMay 10
LEAF-SQL: Level-wise Exploration with Adaptive Fine-graining for Text-to-SQL Skeleton Prediction

Zhao Tan, Xiping Liu, Qing Shu et al.

Text-to-SQL translates natural language questions into executable SQL queries, enabling intuitive database access for non-experts. While large language models achieve strong performance on Text-to-SQL with prompting, they still struggle with complex queries that involve deeply nested logic or multiple clauses. A widely used approach employs SQL skeletons--intermediate representations of query logic--to streamline generation, but existing methods are limited by their reliance on a single structural hypothesis and lack of progressive reasoning. To overcome these limitations, we propose LEAF-SQL, a novel framework that reframes skeleton prediction as a coarse-to-fine tree search process. LEAF-SQL enables systematic exploration of diverse structural hypotheses with adaptive refinement. Several key techniques are employed in LEAF-SQL: (1) a three-level skeleton hierarchy to guide the search, (2) a Skeleton Formulation Agent to generate diverse candidates, and (3) a Skeleton Evaluation Agent to efficiently prune the search space. This integrated design yields skeleton candidates that are both structurally diverse and granularity-adaptive, providing a stronger foundation for the SQL generation. Extensive experiments show that LEAF-SQL consistently improves the performance of various LLM backbones. On the official hidden test set of the challenging BIRD benchmark, our method achieves 71.6 execution accuracy, which outperforms leading search-based and skeleton-based methods, affirming its effectiveness for complex queries.

AIFeb 19
Sonar-TS: Search-Then-Verify Natural Language Querying for Time Series Databases

Zhao Tan, Yiji Zhao, Shiyu Wang et al.

Natural Language Querying for Time Series Databases (NLQ4TSDB) aims to assist non-expert users retrieve meaningful events, intervals, and summaries from massive temporal records. However, existing Text-to-SQL methods are not designed for continuous morphological intents such as shapes or anomalies, while time series models struggle to handle ultra-long histories. To address these challenges, we propose Sonar-TS, a neuro-symbolic framework that tackles NLQ4TSDB via a Search-Then-Verify pipeline. Analogous to active sonar, it utilizes a feature index to ping candidate windows via SQL, followed by generated Python programs to lock on and verify candidates against raw signals. To enable effective evaluation, we introduce NLQTSBench, the first large-scale benchmark designed for NLQ over TSDB-scale histories. Our experiments highlight the unique challenges within this domain and demonstrate that Sonar-TS effectively navigates complex temporal queries where traditional methods fail. This work presents the first systematic study of NLQ4TSDB, offering a general framework and evaluation standard to facilitate future research.

CLApr 21, 2024
EPI-SQL: Enhancing Text-to-SQL Translation with Error-Prevention Instructions

Xiping Liu, Zhao Tan

The conversion of natural language queries into SQL queries, known as Text-to-SQL, is a critical yet challenging task. This paper introduces EPI-SQL, a novel methodological framework leveraging Large Language Models (LLMs) to enhance the performance of Text-to-SQL tasks. EPI-SQL operates through a four-step process. Initially, the method involves gathering instances from the Spider dataset on which LLMs are prone to failure. These instances are then utilized to generate general error-prevention instructions (EPIs). Subsequently, LLMs craft contextualized EPIs tailored to the specific context of the current task. Finally, these context-specific EPIs are incorporated into the prompt used for SQL generation. EPI-SQL is distinguished in that it provides task-specific guidance, enabling the model to circumvent potential errors for the task at hand. Notably, the methodology rivals the performance of advanced few-shot methods despite being a zero-shot approach. An empirical assessment using the Spider benchmark reveals that EPI-SQL achieves an execution accuracy of 85.1\%, underscoring its effectiveness in generating accurate SQL queries through LLMs. The findings indicate a promising direction for future research, i.e. enhancing instructions with task-specific and contextualized rules, for boosting LLMs' performance in NLP tasks.

DBOct 18, 2024
KeyInst: Keyword Instruction for Improving SQL Formulation in Text-to-SQL

Xiping Liu, Zhao Tan

Text-to-SQL parsing involves the translation of natural language queries (NLQs) into their corresponding SQL commands. A principal challenge within this domain is the formulation of SQL queries that are not only syntactically correct but also semantically aligned with the natural language input. However, the intrinsic disparity between the NLQ and the SQL poses a significant challenge. In this research, we introduce Keyword Instruction (KeyInst), a novel method designed to enhance SQL formulation by Large Language Models (LLMs). KeyInst essentially provides guidance on pivotal SQL keywords likely to be part of the final query, thus facilitates a smoother SQL query formulation process. We explore two strategies for integrating KeyInst into Text-to-SQL parsing: a pipeline strategy and a single-pass strategy. The former first generates KeyInst for question, which are then used to prompt LLMs. The latter employs a fine-tuned model to concurrently generate KeyInst and SQL in one step. We developed StrucQL, a benchmark specifically designed for the evaluation of SQL formulation. Extensive experiments on StrucQL and other benchmarks demonstrate that KeyInst significantly improves upon the existing Text-to-SQL prompting techniques.

IRMay 31, 2018
Collaborative Multi-modal deep learning for the personalized product retrieval in Facebook Marketplace

Lu Zheng, Zhao Tan, Kun Han et al.

Facebook Marketplace is quickly gaining momentum among consumers as a favored customer-to-customer (C2C) product trading platform. The recommendation system behind it helps to significantly improve the user experience. Building the recommendation system for Facebook Marketplace is challenging for two reasons: 1) Scalability: the number of products in Facebook Marketplace is huge. Tens of thousands of products need to be scored and recommended within a couple hundred milliseconds for millions of users every day; 2) Cold start: the life span of the C2C products is very short and the user activities on the products are sparse. Thus it is difficult to accumulate enough product level signals for recommendation and we are facing a significant cold start issue. In this paper, we propose to address both the scalability and the cold-start issue by building a collaborative multi-modal deep learning based retrieval system where the compact embeddings for the users and the products are trained with the multi-modal content information. This system shows significant improvement over the benchmark in online and off-line experiments: In the online experiment, it increases the number of messages initiated by the buyer to the seller by +26.95%; in the off-line experiment, it improves the prediction accuracy by +9.58%.