Molecular Graph Convolutions: Moving Beyond Fingerprints
This work addresses the limitations of traditional fingerprint methods in cheminformatics for drug discovery researchers, though it is incremental as it does not yet outperform all existing methods.
The authors tackled the problem of molecular representation in drug discovery by introducing molecular graph convolutions, which allow models to learn directly from molecular graphs, resulting in a new paradigm for ligand-based virtual screening with potential for future improvements.
Molecular "fingerprints" encoding structural information are the workhorse of cheminformatics and machine learning in drug discovery applications. However, fingerprint representations necessarily emphasize particular aspects of the molecular structure while ignoring others, rather than allowing the model to make data-driven decisions. We describe molecular "graph convolutions", a machine learning architecture for learning from undirected graphs, specifically small molecules. Graph convolutions use a simple encoding of the molecular graph---atoms, bonds, distances, etc.---which allows the model to take greater advantage of information in the graph structure. Although graph convolutions do not outperform all fingerprint-based methods, they (along with other graph-based methods) represent a new paradigm in ligand-based virtual screening with exciting opportunities for future improvement.