Exploring the Role of Artificial Intelligence and Machine Learning in Process Optimization for Chemical Industry
This work provides benchmarks for tool development in chemical informatics by assessing how existing OCSR methods handle image degradation, addressing a specific bottleneck for researchers and industry professionals.
The study evaluated the robustness of Optical Chemical Structure Recognition (OCSR) tools on chemically structured images degraded by compression, noise, distortion, and black overlays, finding significant performance variations with MolScribe achieving 94.6% accuracy on undamaged images and 55.8% under heavy compression.
The crucial field of Optical Chemical Structure Recognition (OCSR) aims to transform chemical structure photographs into machine-readable formats so that chemical databases may be efficiently stored and queried. Although a number of OCSR technologies have been created, little is known about how well they work in different picture deterioration scenarios. In this work, a new dataset of chemically structured images that have been systematically harmed graphically by compression, noise, distortion, and black overlays is presented. On these subsets, publicly accessible OCSR tools were thoroughly tested to determine how resilient they were to unfavorable circumstances. The outcomes show notable performance variation, underscoring each tool's advantages and disadvantages. Interestingly, MolScribe performed best under heavy compression (55.8% at 99%) and had the highest identification rate on undamaged photos (94.6%). MolVec performed exceptionally well against noise and black overlay (86.8% at 40%), although it declined under extreme distortion (<70%). With recognition rates below 30%, Decimer demonstrated strong sensitivity to noise and black overlay, but Imago had the lowest baseline accuracy (73.6%). The creative assessment of this study offers important new information about how well the OCSR tool performs when images deteriorate, as well as useful standards for tool development in the future.