LGAIOct 10, 2023

Lo-Hi: Practical ML Drug Discovery Benchmark

arXiv:2310.06399v116 citationsh-index: 1Has Code
Originality Incremental advance
AI Analysis

This addresses the issue for researchers and practitioners in drug discovery by providing a more realistic benchmark, though it is incremental as it builds on existing benchmark efforts.

The authors tackled the problem of unrealistic benchmarks in machine learning for drug discovery by creating the Lo-Hi benchmark, which includes Lead Optimization and Hit Identification tasks to better reflect real-world processes, and they tested state-of-the-art models to identify which perform better under practical settings.

Finding new drugs is getting harder and harder. One of the hopes of drug discovery is to use machine learning models to predict molecular properties. That is why models for molecular property prediction are being developed and tested on benchmarks such as MoleculeNet. However, existing benchmarks are unrealistic and are too different from applying the models in practice. We have created a new practical \emph{Lo-Hi} benchmark consisting of two tasks: Lead Optimization (Lo) and Hit Identification (Hi), corresponding to the real drug discovery process. For the Hi task, we designed a novel molecular splitting algorithm that solves the Balanced Vertex Minimum $k$-Cut problem. We tested state-of-the-art and classic ML models, revealing which works better under practical settings. We analyzed modern benchmarks and showed that they are unrealistic and overoptimistic. Review: https://openreview.net/forum?id=H2Yb28qGLV Lo-Hi benchmark: https://github.com/SteshinSS/lohi_neurips2023 Lo-Hi splitter library: https://github.com/SteshinSS/lohi_splitter

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes