HCAIDec 24, 2021

nvBench: A Large-Scale Synthesized Dataset for Cross-Domain Natural Language to Visualization Task

arXiv:2112.12926v145 citations
Originality Synthesis-oriented
AI Analysis

This addresses the data bottleneck for researchers and developers working on cross-domain NL2VIS systems, though it is incremental as it synthesizes existing (NL, SQL) benchmarks.

The authors tackled the lack of large-scale benchmarks for natural language to visualization (NL2VIS) tasks by creating nvBench, a dataset with 25,750 (NL, VIS) pairs from 750 tables across 105 domains, validated by experts and crowd workers, which enabled training deep learning models to advance the field.

NL2VIS - which translates natural language (NL) queries to corresponding visualizations (VIS) - has attracted more and more attention both in commercial visualization vendors and academic researchers. In the last few years, the advanced deep learning-based models have achieved human-like abilities in many natural language processing (NLP) tasks, which clearly tells us that the deep learning-based technique is a good choice to push the field of NL2VIS. However, a big balk is the lack of benchmarks with lots of (NL, VIS) pairs. We present nvBench, the first large-scale NL2VIS benchmark, containing 25,750 (NL, VIS) pairs from 750 tables over 105 domains, synthesized from (NL, SQL) benchmarks to support cross-domain NL2VIS task. The quality of nvBench has been extensively validated by 23 experts and 300+ crowd workers. Deep learning-based models training using nvBench demonstrate that nvBench can push the field of NL2VIS.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes