LGAIJul 14, 2022

Unified 2D and 3D Pre-Training of Molecular Representations

Microsoft
arXiv:2207.08806v189 citationsh-index: 91
Originality Highly original
AI Analysis

This work addresses the challenge of integrating 2D and 3D molecular data for improved property prediction and generation, offering a novel approach with broad applications in chemistry and drug discovery.

The paper tackles the problem of molecular representation learning by proposing a unified 2D and 3D pre-training method, achieving state-of-the-art results on 10 out of 11 downstream tasks with an average improvement of 8.3% on 2D-only tasks.

Molecular representation learning has attracted much attention recently. A molecule can be viewed as a 2D graph with nodes/atoms connected by edges/bonds, and can also be represented by a 3D conformation with 3-dimensional coordinates of all atoms. We note that most previous work handles 2D and 3D information separately, while jointly leveraging these two sources may foster a more informative representation. In this work, we explore this appealing idea and propose a new representation learning method based on a unified 2D and 3D pre-training. Atom coordinates and interatomic distances are encoded and then fused with atomic representations through graph neural networks. The model is pre-trained on three tasks: reconstruction of masked atoms and coordinates, 3D conformation generation conditioned on 2D graph, and 2D graph generation conditioned on 3D conformation. We evaluate our method on 11 downstream molecular property prediction tasks: 7 with 2D information only and 4 with both 2D and 3D information. Our method achieves state-of-the-art results on 10 tasks, and the average improvement on 2D-only tasks is 8.3%. Our method also achieves significant improvement on two 3D conformation generation tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes