LGAIBMJul 18, 2022

FunQG: Molecular Representation Learning Via Quotient Graphs

arXiv:2207.08597v214 citationsh-index: 14
Originality Incremental advance
AI Analysis

This addresses computational and performance bottlenecks in molecular property prediction for chemistry and drug discovery, offering a cost-effective solution, though it is incremental as it builds on existing graph coarsening and GNN methods.

The paper tackles limitations of graph neural networks (GNNs) in molecular representation learning, such as computational inefficiency and performance issues, by proposing FunQG, a framework that simplifies molecular graphs into smaller, informative quotient graphs using functional groups. Experiments show it significantly outperforms state-of-the-art baselines on various benchmarks while reducing parameters and computational costs.

Learning expressive molecular representations is crucial to facilitate the accurate prediction of molecular properties. Despite the significant advancement of graph neural networks (GNNs) in molecular representation learning, they generally face limitations such as neighbors-explosion, under-reaching, over-smoothing, and over-squashing. Also, GNNs usually have high computational costs because of the large-scale number of parameters. Typically, such limitations emerge or increase when facing relatively large-size graphs or using a deeper GNN model architecture. An idea to overcome these problems is to simplify a molecular graph into a small, rich, and informative one, which is more efficient and less challenging to train GNNs. To this end, we propose a novel molecular graph coarsening framework named FunQG utilizing Functional groups, as influential building blocks of a molecule to determine its properties, based on a graph-theoretic concept called Quotient Graph. By experiments, we show that the resulting informative graphs are much smaller than the molecular graphs and thus are good candidates for training GNNs. We apply the FunQG on popular molecular property prediction benchmarks and then compare the performance of some popular baseline GNNs on the obtained datasets with the performance of several state-of-the-art baselines on the original datasets. By experiments, this method significantly outperforms previous baselines on various datasets, besides its dramatic reduction in the number of parameters and low computational costs. Therefore, the FunQG can be used as a simple, cost-effective, and robust method for solving the molecular representation learning problem.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes