LGCHEM-PHNov 14, 2020

Deep Spatial Learning with Molecular Vibration

arXiv:2011.07200v11 citations
Originality Incremental advance
AI Analysis

This addresses data scarcity issues in molecular machine learning, particularly for tasks with limited computational chemistry data, though it appears incremental as it builds on physics-informed augmentation techniques.

The paper tackles the problem of machine learning over-fitting due to data scarcity in molecular science by proposing a method to extract and distort molecular features for data augmentation, resulting in a relative error drop from 16.34% to 6.71% and coefficient of determination increase from 0.16 to 0.75 in predicting nanofiltration membrane properties.

Machine learning over-fitting caused by data scarcity greatly limits the application of machine learning for molecules. Due to manufacturing processes difference, big data is not always rendered available through computational chemistry methods for some tasks, causing data scarcity problem for machine learning algorithms. Here we propose to extract the natural features of molecular structures and rationally distort them to augment the data availability. This method allows a machine learning project to leverage the powerful fit of physics-informed augmentation for providing significant boost to predictive accuracy. Successfully verified by the prediction of rejection rate and flux of thin film polyamide nanofiltration membranes, with the relative error dropping from 16.34% to 6.71% and the coefficient of determination rising from 0.16 to 0.75, the proposed deep spatial learning with molecular vibration is widely instructive for molecular science. Experimental comparison unequivocally demonstrates its superiority over common learning algorithms.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes