LGOct 11, 2025

Hierarchical Bayesian Flow Networks for Molecular Graph Generation

arXiv:2510.10211v22 citationsh-index: 7
Originality Highly original
AI Analysis

This addresses a fundamental limitation in molecular graph generation for drug discovery and materials science, offering a novel approach to improve diversity and generalization.

The paper tackles the discrepancy between training and inference in molecular graph generation by proposing GraphBFN, a hierarchical Bayesian Flow Networks framework that unifies training with sampling via Cumulative Distribution Functions, achieving state-of-the-art results on QM9 and ZINC250k benchmarks with faster generation.

Molecular graph generation is essentially a classification generation problem, aimed at predicting categories of atoms and bonds. Currently, prevailing paradigms such as continuous diffusion models are trained to predict continuous numerical values, treating the training process as a regression task. However, the final generation necessitates a rounding step to convert these predictions back into discrete classification categories, which is intrinsically a classification operation. Given that the rounding operation is not incorporated during training, there exists a significant discrepancy between the model's training objective and its inference procedure. As a consequence, an excessive emphasis on point-wise precision can lead to overfitting and inefficient learning. This occurs because considerable efforts are devoted to capturing intra-bin variations that are ultimately irrelevant to the discrete nature of the task at hand. Such a flaw results in diminished molecular diversity and constrains the model's generalization capabilities. To address this fundamental limitation, we propose GraphBFN, a novel hierarchical coarse-to-fine framework based on Bayesian Flow Networks that operates on the parameters of distributions. By innovatively introducing Cumulative Distribution Function, GraphBFN is capable of calculating the probability of selecting the correct category, thereby unifying the training objective with the sampling rounding operation. We demonstrate that our method achieves superior performance and faster generation, setting new state-of-the-art results on the QM9 and ZINC250k molecular graph generation benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes