CVDLIRLGSep 19, 2023

MatGD: Materials Graph Digitizer

arXiv:2311.12806v18 citationsh-index: 4
Originality Incremental advance
AI Analysis

This tool addresses the challenge of extracting structured data from figures for materials science researchers, enabling better data collection to train machine learning models for material predictions and discovery, though it is incremental as it builds on existing figure mining methods.

The researchers tackled the problem of digitizing data lines from scientific graphs in materials science publications, developing MatGD with over 99% accuracy in legend detection and 66% accuracy in data line separation, outperforming existing tools.

We have developed MatGD (Material Graph Digitizer), which is a tool for digitizing a data line from scientific graphs. The algorithm behind the tool consists of four steps: (1) identifying graphs within subfigures, (2) separating axes and data sections, (3) discerning the data lines by eliminating irrelevant graph objects and matching with the legend, and (4) data extraction and saving. From the 62,534 papers in the areas of batteries, catalysis, and MOFs, 501,045 figures were mined. Remarkably, our tool showcased performance with over 99% accuracy in legend marker and text detection. Moreover, its capability for data line separation stood at 66%, which is much higher compared to other existing figure mining tools. We believe that this tool will be integral to collecting both past and future data from publications, and these data can be used to train various machine learning models that can enhance material predictions and new materials discovery.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes