LGMar 12, 2018

PotentialNet for Molecular Property Prediction

Evan N. Feinberg, Debnil Sur, Zhenqin Wu, Brooke E. Husic, Huanghao Mai, Yang Li, Saisai Sun, Jianyi Yang, Bharath Ramsundar, Vijay S. Pande

arXiv:1803.04465v220.8424 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of optimizing molecular properties across scales for drug discovery, offering incremental improvements through tailored deep learning models and evaluation methods.

The authors tackled molecular property prediction for drug discovery by introducing the PotentialNet family of graph convolutions, achieving state-of-the-art performance in protein-ligand binding affinity and setting new standards in ligand-based tasks, with new metrics and validation strategies to enhance model generalizability.

The arc of drug discovery entails a multiparameter optimization problem spanning vast length scales. They key parameters range from solubility (angstroms) to protein-ligand binding (nanometers) to in vivo toxicity (meters). Through feature learning---instead of feature engineering---deep neural networks promise to outperform both traditional physics-based and knowledge-based machine learning models for predicting molecular properties pertinent to drug discovery. To this end, we present the PotentialNet family of graph convolutions. These models are specifically designed for and achieve state-of-the-art performance for protein-ligand binding affinity. We further validate these deep neural networks by setting new standards of performance in several ligand-based tasks. In parallel, we introduce a new metric, the Regression Enrichment Factor $EF_χ^{(R)}$, to measure the early enrichment of computational models for chemical data. Finally, we introduce a cross-validation strategy based on structural homology clustering that can more accurately measure model generalizability, which crucially distinguishes the aims of machine learning for drug discovery from standard machine learning tasks.

View on arXiv PDF

Similar