AI-Bind: Improving Binding Predictions for Novel Protein Targets and Ligands
This addresses a critical bottleneck in drug discovery for identifying new drug-target interactions, though it is incremental as it builds on existing network and pre-training methods.
The paper tackled the problem of deep learning models failing to generalize to novel protein-ligand interactions in drug discovery, and introduced AI-Bind, a pipeline that improved binding predictions for novel targets and ligands, validated by predicting compounds for SARS-CoV-2 proteins with docking simulations.
Identifying novel drug-target interactions (DTI) is a critical and rate limiting step in drug discovery. While deep learning models have been proposed to accelerate the identification process, we show that state-of-the-art models fail to generalize to novel (i.e., never-before-seen) structures. We first unveil the mechanisms responsible for this shortcoming, demonstrating how models rely on shortcuts that leverage the topology of the protein-ligand bipartite network, rather than learning the node features. Then, we introduce AI-Bind, a pipeline that combines network-based sampling strategies with unsupervised pre-training, allowing us to limit the annotation imbalance and improve binding predictions for novel proteins and ligands. We illustrate the value of AI-Bind by predicting drugs and natural compounds with binding affinity to SARS-CoV-2 viral proteins and the associated human proteins. We also validate these predictions via docking simulations and comparison with recent experimental evidence, and step up the process of interpreting machine learning prediction of protein-ligand binding by identifying potential active binding sites on the amino acid sequence. Overall, AI-Bind offers a powerful high-throughput approach to identify drug-target combinations, with the potential of becoming a powerful tool in drug discovery.