LGCVIVQMOct 7, 2021

Pre-training Molecular Graph Representation with 3D Geometry

arXiv:2110.07728v2453 citations
Originality Incremental advance
AI Analysis

This addresses a bottleneck in drug and material discovery by improving molecular functionality prediction, though it is incremental as it builds on existing self-supervised learning approaches.

The paper tackles the problem of learning molecular graph representations when 3D geometric information is often unavailable, by proposing a self-supervised pre-training framework that leverages consistency between 2D and 3D views to enhance the encoder, resulting in consistent outperformance over existing methods.

Molecular graph representation learning is a fundamental problem in modern drug and material discovery. Molecular graphs are typically modeled by their 2D topological structures, but it has been recently discovered that 3D geometric information plays a more vital role in predicting molecular functionalities. However, the lack of 3D information in real-world scenarios has significantly impeded the learning of geometric graph representation. To cope with this challenge, we propose the Graph Multi-View Pre-training (GraphMVP) framework where self-supervised learning (SSL) is performed by leveraging the correspondence and consistency between 2D topological structures and 3D geometric views. GraphMVP effectively learns a 2D molecular graph encoder that is enhanced by richer and more discriminative 3D geometry. We further provide theoretical insights to justify the effectiveness of GraphMVP. Finally, comprehensive experiments show that GraphMVP can consistently outperform existing graph SSL methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes