CLHCIRMay 7, 2020

Quda: Natural Language Queries for Visual Data Analytics

arXiv:2005.03257v530 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of improving natural language interfaces for data analysts by providing a foundational dataset, though it is incremental as it builds on existing V-NLI research.

The authors tackled the challenge of identifying analytic tasks from ambiguous natural language queries in visualization interfaces by creating Quda, a dataset of 14,035 annotated queries, and demonstrated its utility in applications.

The identification of analytic tasks from free text is critical for visualization-oriented natural language interfaces (V-NLIs) to suggest effective visualizations. However, it is challenging due to the ambiguity and complexity nature of human language. To address this challenge, we present a new dataset, called Quda, that aims to help V-NLIs recognize analytic tasks from free-form natural language by training and evaluating cutting-edge multi-label classification models. Our dataset contains $14,035$ diverse user queries, and each is annotated with one or multiple analytic tasks. We achieve this goal by first gathering seed queries with data analysts and then employing extensive crowd force for paraphrase generation and validation. We demonstrate the usefulness of Quda through three applications. This work is the first attempt to construct a large-scale corpus for recognizing analytic tasks. With the release of Quda, we hope it will boost the research and development of V-NLIs in data analysis and visualization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes