QMLGOct 11, 2024

KinDEL: DNA-Encoded Library Dataset for Kinase Inhibitors

arXiv:2410.08938v23 citationsh-index: 19ICML
Originality Synthesis-oriented
AI Analysis

This provides a valuable resource for researchers in computational drug discovery, though it is incremental as it primarily offers a new dataset rather than a novel method.

The authors tackled the scarcity of publicly available DNA-Encoded Library (DEL) datasets for machine learning in drug discovery by introducing KinDEL, a large dataset with 81 million compounds focused on two kinases, and they evaluated machine learning techniques using biophysical assay validation data.

DNA-Encoded Libraries (DELs) represent a transformative technology in drug discovery, facilitating the high-throughput exploration of vast chemical spaces. Despite their potential, the scarcity of publicly available DEL datasets presents a bottleneck for the advancement of machine learning methodologies in this domain. To address this gap, we introduce KinDEL, one of the largest publicly accessible DEL datasets and the first one that includes binding poses from molecular docking experiments. Focused on two kinases, Mitogen-Activated Protein Kinase 14 (MAPK14) and Discoidin Domain Receptor Tyrosine Kinase 1 (DDR1), KinDEL includes 81 million compounds, offering a rich resource for computational exploration. Additionally, we provide comprehensive biophysical assay validation data, encompassing both on-DNA and off-DNA measurements, which we use to evaluate a suite of machine learning techniques, including novel structure-based probabilistic models. We hope that our benchmark, encompassing both 2D and 3D structures, will help advance the development of machine learning models for data-driven hit identification using DELs.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes