LGJul 10, 2025

Can AI-predicted complexes teach machine learning to compute drug binding affinity?

Wei-Tse Hsu, Savva Grevtsev, Thomas Douglas, Aniket Magarkar, Philip C. Biggin

arXiv:2507.07882v11 citationsh-index: 19J Chem Inf Model

Originality Incremental advance

AI Analysis

This work addresses data scarcity in drug discovery by enabling synthetic data augmentation, though it is incremental as it builds on existing co-folding models.

The study tackled the problem of improving machine learning-based scoring functions for drug binding affinity prediction by using co-folding models for synthetic data augmentation, finding that performance gains depend on structural quality and establishing heuristics to identify high-quality predictions without experimental structures.

We evaluate the feasibility of using co-folding models for synthetic data augmentation in training machine learning-based scoring functions (MLSFs) for binding affinity prediction. Our results show that performance gains depend critically on the structural quality of augmented data. In light of this, we established simple heuristics for identifying high-quality co-folding predictions without reference structures, enabling them to substitute for experimental structures in MLSF training. Our study informs future data augmentation strategies based on co-folding models.

View on arXiv PDF

Similar