BM AI LGMar 8, 2023

Infinite Physical Monkey: Do Deep Learning Methods Really Perform Better in Conformation Generation?

Haotian Zhang, Jintu Zhang, Huifeng Zhao, Dejun Jiang, Yafeng Deng

arXiv:2304.10494v12.31 citationsh-index: 23

Originality Synthesis-oriented

AI Analysis

This work questions the superiority of deep learning in drug discovery tasks, suggesting that simpler methods can be effective, which is incremental as it builds on prior skepticism about DL performance.

The paper tackles the problem of molecular conformation generation by showing that a simple stochastic sampling method achieves higher coverage of benchmark conformations than most deep learning methods, and also performs competitively in binding pose prediction.

Conformation Generation is a fundamental problem in drug discovery and cheminformatics. And organic molecule conformation generation, particularly in vacuum and protein pocket environments, is most relevant to drug design. Recently, with the development of geometric neural networks, the data-driven schemes have been successfully applied in this field, both for molecular conformation generation (in vacuum) and binding pose generation (in protein pocket). The former beats the traditional ETKDG method, while the latter achieves similar accuracy compared with the widely used molecular docking software. Although these methods have shown promising results, some researchers have recently questioned whether deep learning (DL) methods perform better in molecular conformation generation via a parameter-free method. To our surprise, what they have designed is some kind analogous to the famous infinite monkey theorem, the monkeys that are even equipped with physics education. To discuss the feasibility of their proving, we constructed a real infinite stochastic monkey for molecular conformation generation, showing that even with a more stochastic sampler for geometry generation, the coverage of the benchmark QM-computed conformations are higher than those of most DL-based methods. By extending their physical monkey algorithm for binding pose prediction, we also discover that the successful docking rate also achieves near-best performance among existing DL-based docking models. Thus, though their conclusions are right, their proof process needs more concern.

View on arXiv PDF

Similar