CVQMMay 23, 2022

MolMiner: You only look once for chemical structure recognition

arXiv:2205.11016v125 citationsh-index: 37
Originality Incremental advance
AI Analysis

This addresses the need for automated conversion of chemical structures in scientific documents, which is crucial for researchers and industries dealing with large backlogs of printed literature, though it appears incremental as it applies existing deep learning methods to a specific domain.

The authors tackled the problem of translating printed molecular structure depictions into machine-readable formats, known as Optical Chemical Structure Recognition (OCSR), by developing MolMiner, a software using deep neural networks for semantic segmentation and object detection, achieving state-of-the-art performance on four benchmark datasets.

Molecular structures are always depicted as 2D printed form in scientific documents like journal papers and patents. However, these 2D depictions are not machine-readable. Due to a backlog of decades and an increasing amount of these printed literature, there is a high demand for the translation of printed depictions into machine-readable formats, which is known as Optical Chemical Structure Recognition (OCSR). Most OCSR systems developed over the last three decades follow a rule-based approach where the key step of vectorization of the depiction is based on the interpretation of vectors and nodes as bonds and atoms. Here, we present a practical software MolMiner, which is primarily built up using deep neural networks originally developed for semantic segmentation and object detection to recognize atom and bond elements from documents. These recognized elements can be easily connected as a molecular graph with distance-based construction algorithm. We carefully evaluate our software on four benchmark datasets with the state-of-the-art performance. Various real application scenarios are also tested, yielding satisfactory outcomes. The free download links of Mac and Windows versions are available: Mac: https://molminer-cdn.iipharma.cn/pharma-mind/artifact/latest/mac/PharmaMind-mac-latest-setup.dmg and Windows: https://molminer-cdn.iipharma.cn/pharma-mind/artifact/latest/win/PharmaMind-win-latest-setup.exe

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes