LGQMSep 4, 2023

Hierarchical Grammar-Induced Geometry for Data-Efficient Molecular Property Prediction

arXiv:2309.01788v1h-index: 81Has Code
Originality Incremental advance
AI Analysis

This work addresses the data scarcity problem in material and drug discovery, offering an incremental improvement over existing deep learning methods for molecular property prediction.

The paper tackles the challenge of limited labeled data for molecular property prediction by proposing a data-efficient predictor that uses a learnable hierarchical molecular grammar to induce an explicit geometry for molecular graphs, which outperforms various baselines on both small and large datasets, including in cases with extremely limited data.

The prediction of molecular properties is a crucial task in the field of material and drug discovery. The potential benefits of using deep learning techniques are reflected in the wealth of recent literature. Still, these techniques are faced with a common challenge in practice: Labeled data are limited by the cost of manual extraction from literature and laborious experimentation. In this work, we propose a data-efficient property predictor by utilizing a learnable hierarchical molecular grammar that can generate molecules from grammar production rules. Such a grammar induces an explicit geometry of the space of molecular graphs, which provides an informative prior on molecular structural similarity. The property prediction is performed using graph neural diffusion over the grammar-induced geometry. On both small and large datasets, our evaluation shows that this approach outperforms a wide spectrum of baselines, including supervised and pre-trained graph neural networks. We include a detailed ablation study and further analysis of our solution, showing its effectiveness in cases with extremely limited data. Code is available at https://github.com/gmh14/Geo-DEG.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes