LGAIMay 15

Multi-level Self-supervised Pretraining on Compositional Hierarchical Graph for Molecular Property Prediction

arXiv:2605.1608825.0
Predicted impact top 78% in LG · last 90 daysOriginality Highly original
AI Analysis

For researchers in molecular property prediction, this work provides a new pretraining method that outperforms existing approaches on multiple benchmarks by leveraging hierarchical graph structures and multi-level self-supervised objectives.

MolCHG introduces a multi-level self-supervised pretraining framework using a Compositional Hierarchical Graph with four node types across three semantic levels, achieving best performance on seven out of nine MoleculeNet benchmarks for molecular property prediction.

Self-supervised pretraining on molecular graphs has emerged as a promising approach for molecular property prediction, yet most existing methods operate at a single structural granularity and treat bond information as auxiliary edge attributes rather than as an independent semantic layer. In this work, we propose MolCHG, a multi-level self-supervised pretraining framework built upon a novel Compositional Hierarchical Graph that organizes molecular structure into four types of nodes across three semantic levels. By introducing a bond graph that operates in parallel with the atom graph, our architecture elevates bond-level information to independently evolving node representations, enabling fragment nodes to aggregate atom-level and bond-level semantics on an equal footing. We design three level-specific pretraining objectives: an atom-bond cross-view contrastive task that aligns the atom-view and bond-view representations within each fragment, a fragment-level functional group prediction task to inject domain-relevant chemical knowledge, and graph-level structure prediction tasks to encode global molecular topology. Experiments on nine MoleculeNet benchmarks demonstrate that MolCHG achieves the best performance on seven datasets across both classification and regression tasks, remaining competitive with the strongest baselines on the rest. Ablation studies further confirm that the multi-level supervision signals are complementary and that each component contributes to the overall performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes