A Survey on Figure Classification Techniques in Scientific Documents
It addresses the need for automated figure classification to enhance information extraction from scientific documents, but it is incremental as it synthesizes existing research rather than introducing new methods.
This survey paper tackles the problem of classifying figures in scientific documents into categories like tables and plots, and it reviews existing methods and datasets while identifying research gaps for future work.
Figures visually represent an essential piece of information and provide an effective means to communicate scientific facts. Recently there have been many efforts toward extracting data directly from figures, specifically from tables, diagrams, and plots, using different Artificial Intelligence and Machine Learning techniques. This is because removing information from figures could lead to deeper insights into the concepts highlighted in the scientific documents. In this survey paper, we systematically categorize figures into five classes - tables, photos, diagrams, maps, and plots, and subsequently present a critical review of the existing methodologies and data sets that address the problem of figure classification. Finally, we identify the current research gaps and provide possible directions for further research on figure classification.