LGMLMay 15, 2023

MolHF: A Hierarchical Normalizing Flow for Molecular Graph Generation

arXiv:2305.08457v113 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses the challenge of generating complex, large molecules for scientific applications like drug discovery, representing an incremental advance by introducing hierarchical generation to flow-based models.

The paper tackles molecular graph generation by proposing MolHF, a hierarchical normalizing flow model that generates molecules in a coarse-to-fine manner, achieving state-of-the-art performance in random generation and property optimization and enabling modeling of larger molecules with over 100 heavy atoms.

Molecular de novo design is a critical yet challenging task in scientific fields, aiming to design novel molecular structures with desired property profiles. Significant progress has been made by resorting to generative models for graphs. However, limited attention is paid to hierarchical generative models, which can exploit the inherent hierarchical structure (with rich semantic information) of the molecular graphs and generate complex molecules of larger size that we shall demonstrate to be difficult for most existing models. The primary challenge to hierarchical generation is the non-differentiable issue caused by the generation of intermediate discrete coarsened graph structures. To sidestep this issue, we cast the tricky hierarchical generation problem over discrete spaces as the reverse process of hierarchical representation learning and propose MolHF, a new hierarchical flow-based model that generates molecular graphs in a coarse-to-fine manner. Specifically, MolHF first generates bonds through a multi-scale architecture, then generates atoms based on the coarsened graph structure at each scale. We demonstrate that MolHF achieves state-of-the-art performance in random generation and property optimization, implying its high capacity to model data distribution. Furthermore, MolHF is the first flow-based model that can be applied to model larger molecules (polymer) with more than 100 heavy atoms. The code and models are available at https://github.com/violet-sto/MolHF.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes