Collecting and Characterizing Natural Language Utterances for Specifying Data Visualizations
This work addresses the need for better natural language interfaces in data visualization tools, providing a foundational dataset for researchers and developers, though it is incremental as it focuses on data collection rather than novel methods.
The study tackled the lack of empirical understanding of how people use natural language to specify data visualizations by collecting utterances from 102 participants, resulting in a curated corpus of 10 visualizations per dataset for evaluation and system development.
Natural language interfaces (NLIs) for data visualization are becoming increasingly popular both in academic research and in commercial software. Yet, there is a lack of empirical understanding of how people specify visualizations through natural language. To bridge this gap, we conducted an online study with 102 participants. We showed participants a series of ten visualizations for a given dataset and asked them to provide utterances they would pose to generate the displayed charts. The curated list of utterances generated from the study is provided below. This corpus of utterances can be used to evaluate existing NLIs for data visualization as well as for creating new systems and models to generate visualizations from natural language utterances.