MaskMol: Knowledge-guided Molecular Image Pre-Training Framework for Activity Cliffs
This addresses the challenge of distinguishing activity cliffs in drug discovery, providing a novel method for molecular image representation learning and virtual screening, though it is incremental as it builds on image-based approaches.
The paper tackles the problem of activity cliffs, where structurally similar molecules have large potency differences, by introducing MaskMol, a knowledge-guided molecular image pre-training framework. It outperforms 25 state-of-the-art methods in activity cliff estimation and potency prediction across 20 macromolecular targets, with high accuracy and transferability.
Activity cliffs, which refer to pairs of molecules that are structurally similar but show significant differences in their potency, can lead to model representation collapse and make the model challenging to distinguish them. Our research indicates that as molecular similarity increases, graph-based methods struggle to capture these nuances, whereas image-based approaches effectively retain the distinctions. Thus, we developed MaskMol, a knowledge-guided molecular image self-supervised learning framework. MaskMol accurately learns the representation of molecular images by considering multiple levels of molecular knowledge, such as atoms, bonds, and substructures. By utilizing pixel masking tasks, MaskMol extracts fine-grained information from molecular images, overcoming the limitations of existing deep learning models in identifying subtle structural changes. Experimental results demonstrate MaskMol's high accuracy and transferability in activity cliff estimation and compound potency prediction across 20 different macromolecular targets, outperforming 25 state-of-the-art deep learning and machine learning approaches. Visualization analyses reveal MaskMol's high biological interpretability in identifying activity cliff-relevant molecular substructures. Notably, through MaskMol, we identified candidate EP4 inhibitors that could be used to treat tumors. This study not only raises awareness about activity cliffs but also introduces a novel method for molecular image representation learning and virtual screening, advancing drug discovery and providing new insights into structure-activity relationships (SAR).