A Hybrid Semantic Parsing Approach for Tabular Data Analysis
This addresses the challenge of making data analysis accessible across domains and languages, though it appears incremental in combining existing techniques.
The paper tackles the problem of translating natural language questions to SQL queries for tabular data analysis, achieving state-of-the-art performance on the WikiSQL benchmark and demonstrating promising results for multilingual and complex queries.
This paper presents a novel approach to translating natural language questions to SQL queries for given tables, which meets three requirements as a real-world data analysis application: cross-domain, multilingualism and enabling quick-start. Our proposed approach consists of: (1) a novel data abstraction step before the parser to make parsing table-agnosticism; (2) a set of semantic rules for parsing abstracted data-analysis questions to intermediate logic forms as tree derivations to reduce the search space; (3) a neural-based model as a local scoring function on a span-based semantic parser for structured optimization and efficient inference. Experiments show that our approach outperforms state-of-the-art algorithms on a large open benchmark dataset WikiSQL. We also achieve promising results on a small dataset for more complex queries in both English and Chinese, which demonstrates our language expansion and quick-start ability.