LGCHEM-PHQMJun 15, 2023

Symmetry-Informed Geometric Representation for Molecules, Proteins, and Crystalline Materials

arXiv:2306.09375v142 citationsh-index: 182
Originality Synthesis-oriented
AI Analysis

This work addresses a benchmarking gap for machine learning researchers and scientists in computational chemistry, structural biology, and materials science, providing a unified framework to evaluate geometric representation methods, though it is incremental as it builds on existing methods without introducing new paradigms.

The paper tackles the lack of benchmarking for symmetry-informed geometric representation methods in scientific domains like chemistry and materials science by proposing Geom3D, a platform that includes 16 models and 14 pretraining methods across 46 datasets, aiming to guide researchers in selecting effective techniques.

Artificial intelligence for scientific discovery has recently generated significant interest within the machine learning and scientific communities, particularly in the domains of chemistry, biology, and material discovery. For these scientific problems, molecules serve as the fundamental building blocks, and machine learning has emerged as a highly effective and powerful tool for modeling their geometric structures. Nevertheless, due to the rapidly evolving process of the field and the knowledge gap between science (e.g., physics, chemistry, & biology) and machine learning communities, a benchmarking study on geometrical representation for such data has not been conducted. To address such an issue, in this paper, we first provide a unified view of the current symmetry-informed geometric methods, classifying them into three main categories: invariance, equivariance with spherical frame basis, and equivariance with vector frame basis. Then we propose a platform, coined Geom3D, which enables benchmarking the effectiveness of geometric strategies. Geom3D contains 16 advanced symmetry-informed geometric representation models and 14 geometric pretraining methods over 46 diverse datasets, including small molecules, proteins, and crystalline materials. We hope that Geom3D can, on the one hand, eliminate barriers for machine learning researchers interested in exploring scientific problems; and, on the other hand, provide valuable guidance for researchers in computational chemistry, structural biology, and materials science, aiding in the informed selection of representation techniques for specific applications.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes