Cell Morphology-Guided Small Molecule Generation with GFlowNets
This work addresses the challenge of limited labeled data in drug discovery by enabling HCI-guided molecular design, which is an incremental advancement in generative models for therapeutics.
The paper tackles the problem of generating small molecules guided by high-content imaging (HCI) data without relying on labeled phenotypic annotations, using an unsupervised multimodal joint embedding to define rewards for GFlowNets, and demonstrates that the method produces molecules with high morphological and structural similarity to targets, increasing the likelihood of similar biological activity.
High-content phenotypic screening, including high-content imaging (HCI), has gained popularity in the last few years for its ability to characterize novel therapeutics without prior knowledge of the protein target. When combined with deep learning techniques to predict and represent molecular-phenotype interactions, these advancements hold the potential to significantly accelerate and enhance drug discovery applications. This work focuses on the novel task of HCI-guided molecular design. Generative models for molecule design could be guided by HCI data, for example with a supervised model that links molecules to phenotypes of interest as a reward function. However, limited labeled data, combined with the high-dimensional readouts, can make training these methods challenging and impractical. We consider an alternative approach in which we leverage an unsupervised multimodal joint embedding to define a latent similarity as a reward for GFlowNets. The proposed model learns to generate new molecules that could produce phenotypic effects similar to those of the given image target, without relying on pre-annotated phenotypic labels. We demonstrate that the proposed method generates molecules with high morphological and structural similarity to the target, increasing the likelihood of similar biological activity, as confirmed by an independent oracle model.