BMLGMar 27, 2023

HD-Bind: Encoding of Molecular Structure with Low Precision, Hyperdimensional Binary Representations

arXiv:2303.15604v14 citationsh-index: 23
Originality Incremental advance
AI Analysis

This work addresses the computational bottleneck in drug discovery for researchers by providing an incremental improvement in efficiency for molecular screening tasks.

The paper tackles the problem of efficiently identifying drug-like molecules from large collections by introducing hyperdimensional computing (HDC) with novel encoding algorithms based on extended connectivity fingerprints, achieving up to 90 times more efficiency than traditional machine learning methods and nearly 9 orders of magnitude acceleration compared to molecular docking.

Publicly available collections of drug-like molecules have grown to comprise 10s of billions of possibilities in recent history due to advances in chemical synthesis. Traditional methods for identifying ``hit'' molecules from a large collection of potential drug-like candidates have relied on biophysical theory to compute approximations to the Gibbs free energy of the binding interaction between the drug to its protein target. A major drawback of the approaches is that they require exceptional computing capabilities to consider for even relatively small collections of molecules. Hyperdimensional Computing (HDC) is a recently proposed learning paradigm that is able to leverage low-precision binary vector arithmetic to build efficient representations of the data that can be obtained without the need for gradient-based optimization approaches that are required in many conventional machine learning and deep learning approaches. This algorithmic simplicity allows for acceleration in hardware that has been previously demonstrated for a range of application areas. We consider existing HDC approaches for molecular property classification and introduce two novel encoding algorithms that leverage the extended connectivity fingerprint (ECFP) algorithm. We show that HDC-based inference methods are as much as 90 times more efficient than more complex representative machine learning methods and achieve an acceleration of nearly 9 orders of magnitude as compared to inference with molecular docking. We demonstrate multiple approaches for the encoding of molecular data for HDC and examine their relative performance on a range of challenging molecular property prediction and drug-protein binding classification tasks. Our work thus motivates further investigation into molecular representation learning to develop ultra-efficient pre-screening tools.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes