Decoding the Text Encoding
This addresses the need for improved design and statistical analysis in text visualization for users of word clouds, though it is an incremental step in visualization techniques.
The paper tackles the problem of word clouds lacking thorough data visualization by proposing a fully automatic algorithm to decode word clouds and extract raw data as (word, value) pairs, achieving effective extraction with a low error rate.
Word clouds and text visualization is one of the recent most popular and widely used types of visualizations. Despite the attractiveness and simplicity of producing word clouds, they do not provide a thorough visualization for the distribution of the underlying data. Therefore, it is important to redesign word clouds for improving their design choices and to be able to do further statistical analysis on data. In this paper we have proposed a fully automatic redesigning algorithm for word cloud visualization. Our proposed method is able to decode an input word cloud visualization and provides the raw data in the form of a list of (word, value) pairs. To the best of our knowledge our work is the first attempt to extract raw data from word cloud visualization. We have tested our proposed method both qualitatively and quantitatively. The results of our experiments show that our algorithm is able to extract the words and their weights effectively with considerable low error rate.