LGAIOct 18, 2025

Scaffold-Aware Generative Augmentation and Reranking for Enhanced Virtual Screening

arXiv:2510.16306v11 citationsh-index: 5
Originality Incremental advance
AI Analysis

This work addresses class and structural imbalance in virtual screening for drug discovery, offering a practical solution to identify more diverse active compounds.

The researchers tackled challenges in virtual screening for drug discovery by developing ScaffAug, a scaffold-aware framework that uses generative AI to create synthetic data and reranks results to improve scaffold diversity, achieving enhanced performance across five target classes.

Ligand-based virtual screening (VS) is an essential step in drug discovery that evaluates large chemical libraries to identify compounds that potentially bind to a therapeutic target. However, VS faces three major challenges: class imbalance due to the low active rate, structural imbalance among active molecules where certain scaffolds dominate, and the need to identify structurally diverse active compounds for novel drug development. We introduce ScaffAug, a scaffold-aware VS framework that addresses these challenges through three modules. The augmentation module first generates synthetic data conditioned on scaffolds of actual hits using generative AI, specifically a graph diffusion model. This helps mitigate the class imbalance and furthermore the structural imbalance, due to our proposed scaffold-aware sampling algorithm, designed to produce more samples for active molecules with underrepresented scaffolds. A model-agnostic self-training module is then used to safely integrate the generated synthetic data from our augmentation module with the original labeled data. Lastly, we introduce a reranking module that improves VS by enhancing scaffold diversity in the top recommended set of molecules, while still maintaining and even enhancing the overall general performance of identifying novel, active compounds. We conduct comprehensive computational experiments across five target classes, comparing ScaffAug against existing baseline methods by reporting the performance of multiple evaluation metrics and performing ablation studies on ScaffAug. Overall, this work introduces novel perspectives on effectively enhancing VS by leveraging generative augmentations, reranking, and general scaffold-awareness.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes