Tensor Fields for Data Extraction from Chart Images: Bar Charts and Scatter Plots
This addresses the need for automated chart analysis in data science, though it is incremental as it focuses on specific chart types.
The paper tackled the problem of automating data extraction from raster images of bar charts and scatter plots by proposing a computational model using positive semidefinite second-order tensor fields, with results showing tensor voting is effective for this task.
Charts are an essential part of both graphicacy (graphical literacy), and statistical literacy. As chart understanding has become increasingly relevant in data science, automating chart analysis by processing raster images of the charts has become a significant problem. Automated chart reading involves data extraction and contextual understanding of the data from chart images. In this paper, we perform the first step of determining the computational model of chart images for data extraction for selected chart types, namely, bar charts, and scatter plots. We demonstrate the use of positive semidefinite second-order tensor fields as an effective model. We identify an appropriate tensor field as the model and propose a methodology for the use of its degenerate point extraction for data extraction from chart images. Our results show that tensor voting is effective for data extraction from bar charts and scatter plots, and histograms, as a special case of bar charts.