LGAISep 19, 2024

Improving generalisability of 3D binding affinity models in low data regimes

arXiv:2409.12995v1h-index: 5
Originality Incremental advance
AI Analysis

This work addresses the challenge of developing generalizable binding affinity models for computer-aided drug design, particularly in low-data scenarios, with incremental improvements to existing methods.

The paper tackled the problem of improving the generalizability of 3D binding affinity models in low data regimes by introducing a novel dataset split to minimize similarity leakage, showing that 3D global models outperform protein-specific local models, and demonstrating performance gains for GNNs through supervised and unsupervised pre-training and explicit hydrogen modeling.

Predicting protein-ligand binding affinity is an essential part of computer-aided drug design. However, generalisable and performant global binding affinity models remain elusive, particularly in low data regimes. Despite the evolution of model architectures, current benchmarks are not well-suited to probe the generalisability of 3D binding affinity models. Furthermore, 3D global architectures such as GNNs have not lived up to performance expectations. To investigate these issues, we introduce a novel split of the PDBBind dataset, minimizing similarity leakage between train and test sets and allowing for a fair and direct comparison between various model architectures. On this low similarity split, we demonstrate that, in general, 3D global models are superior to protein-specific local models in low data regimes. We also demonstrate that the performance of GNNs benefits from three novel contributions: supervised pre-training via quantum mechanical data, unsupervised pre-training via small molecule diffusion, and explicitly modeling hydrogen atoms in the input graph. We believe that this work introduces promising new approaches to unlock the potential of GNN architectures for binding affinity modelling.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes