LGCHEM-PHAug 8, 2024

Advancing Molecular Machine Learning Representations with Stereoelectronics-Infused Molecular Graphs

arXiv:2408.04520v213 citationsh-index: 22
AI Analysis

This work addresses the need for higher-fidelity molecular representations in machine learning for complex prediction tasks, offering a novel approach with potential applications in molecular design.

The paper tackled the problem of information-sparse molecular representations in machine learning by infusing quantum-chemical stereoelectronic effects into molecular graphs, resulting in improved performance for molecular property prediction and enabling extrapolation to larger systems like proteins.

Molecular representation is a critical element in our understanding of the physical world and the foundation for modern molecular machine learning. Previous molecular machine learning models have employed strings, fingerprints, global features, and simple molecular graphs that are inherently information-sparse representations. However, as the complexity of prediction tasks increases, the molecular representation needs to encode higher fidelity information. This work introduces a novel approach to infusing quantum-chemical-rich information into molecular graphs via stereoelectronic effects, enhancing expressivity and interpretability. Learning to predict the stereoelectronics-infused representation with a tailored double graph neural network workflow enables its application to any downstream molecular machine learning task without expensive quantum chemical calculations. We show that the explicit addition of stereoelectronic information significantly improves the performance of message-passing 2D machine learning models for molecular property prediction. We show that the learned representations trained on small molecules can accurately extrapolate to much larger molecular structures, yielding chemical insight into orbital interactions for previously intractable systems, such as entire proteins, opening new avenues of molecular design. Finally, we have developed a web application (simg.cheme.cmu.edu) where users can rapidly explore stereoelectronic information for their own molecular systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes