CLSep 14, 2023

Automatic Data Visualization Generation from Chinese Natural Language Questions

Yan Ge, Victor Junqiu Wei, Yuanfeng Song, Jason Chen Zhang, Raymond Chi-Wing Wong

arXiv:2309.07650v117.184 citationsh-index: 13

Originality Synthesis-oriented

AI Analysis

This addresses the need for accessible data visualization tools for Chinese speakers, but it is incremental as it adapts existing methods to a new language.

The paper tackles the problem of automatically generating data visualizations from Chinese natural language questions, a gap in existing research focused on English, and presents a new dataset and model that integrates multilingual BERT and n-gram information, showing the dataset is challenging.

Data visualization has emerged as an effective tool for getting insights from massive datasets. Due to the hardness of manipulating the programming languages of data visualization, automatic data visualization generation from natural languages (Text-to-Vis) is becoming increasingly popular. Despite the plethora of research effort on the English Text-to-Vis, studies have yet to be conducted on data visualization generation from questions in Chinese. Motivated by this, we propose a Chinese Text-to-Vis dataset in the paper and demonstrate our first attempt to tackle this problem. Our model integrates multilingual BERT as the encoder, boosts the cross-lingual ability, and infuses the $n$-gram information into our word representation learning. Our experimental results show that our dataset is challenging and deserves further research.

View on arXiv PDF

Similar