Can AI-predicted complexes teach machine learning to compute drug binding affinity?
This work addresses data scarcity in drug discovery by enabling synthetic data augmentation, though it is incremental as it builds on existing co-folding models.
The study tackled the problem of improving machine learning-based scoring functions for drug binding affinity prediction by using co-folding models for synthetic data augmentation, finding that performance gains depend on structural quality and establishing heuristics to identify high-quality predictions without experimental structures.
We evaluate the feasibility of using co-folding models for synthetic data augmentation in training machine learning-based scoring functions (MLSFs) for binding affinity prediction. Our results show that performance gains depend critically on the structural quality of augmented data. In light of this, we established simple heuristics for identifying high-quality co-folding predictions without reference structures, enabling them to substitute for experimental structures in MLSF training. Our study informs future data augmentation strategies based on co-folding models.