CL HC IRMay 7, 2020

Quda: Natural Language Queries for Visual Data Analytics

Siwei Fu, Kai Xiong, Xiaodong Ge, Siliang Tang, Wei Chen, Yingcai Wu

arXiv:2005.03257v51.630 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of improving natural language interfaces for data analysts by providing a foundational dataset, though it is incremental as it builds on existing V-NLI research.

The authors tackled the challenge of identifying analytic tasks from ambiguous natural language queries in visualization interfaces by creating Quda, a dataset of 14,035 annotated queries, and demonstrated its utility in applications.

The identification of analytic tasks from free text is critical for visualization-oriented natural language interfaces (V-NLIs) to suggest effective visualizations. However, it is challenging due to the ambiguity and complexity nature of human language. To address this challenge, we present a new dataset, called Quda, that aims to help V-NLIs recognize analytic tasks from free-form natural language by training and evaluating cutting-edge multi-label classification models. Our dataset contains $14,035$ diverse user queries, and each is annotated with one or multiple analytic tasks. We achieve this goal by first gathering seed queries with data analysts and then employing extensive crowd force for paraphrase generation and validation. We demonstrate the usefulness of Quda through three applications. This work is the first attempt to construct a large-scale corpus for recognizing analytic tasks. With the release of Quda, we hope it will boost the research and development of V-NLIs in data analysis and visualization.

View on arXiv PDF

Similar