CEAIJan 8

A Semi-supervised Molecular Learning Framework for Activity Cliff Estimation

arXiv:2601.04507v17 citationsh-index: 1IJCAI
Originality Incremental advance
AI Analysis

This addresses a critical bottleneck in drug discovery and material design by improving ML performance on activity cliffs, though it is an incremental advance in semi-supervised learning methods.

The paper tackled the problem of activity cliffs, which cause performance declines in molecular property prediction models, by proposing a semi-supervised learning method called SemiMol that uses pseudo-labels and curriculum learning. The result showed that SemiMol significantly enhances graph-based ML architectures and outperforms state-of-the-art baselines on 30 activity cliff datasets.

Machine learning (ML) enables accurate and fast molecular property predictions, which are of interest in drug discovery and material design. Their success is based on the principle of similarity at its heart, assuming that similar molecules exhibit close properties. However, activity cliffs challenge this principle, and their presence leads to a sharp decline in the performance of existing ML algorithms, particularly graph-based methods. To overcome this obstacle under a low-data scenario, we propose a novel semi-supervised learning (SSL) method dubbed SemiMol, which employs predictions on numerous unannotated data as pseudo-signals for subsequent training. Specifically, we introduce an additional instructor model to evaluate the accuracy and trustworthiness of proxy labels because existing pseudo-labeling approaches require probabilistic outputs to reveal the model's confidence and fail to be applied in regression tasks. Moreover, we design a self-adaptive curriculum learning algorithm to progressively move the target model toward hard samples at a controllable pace. Extensive experiments on 30 activity cliff datasets demonstrate that SemiMol significantly enhances graph-based ML architectures and outpasses state-of-the-art pretraining and SSL baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes