Causal integration of chemical structures improves representations of microscopy images for morphological profiling
This work addresses the challenge of multimodal data integration in microscopy screening for drug discovery, offering an incremental improvement by explicitly modeling compounds as treatments in a causal framework.
The paper tackled the problem of improving learned representations for cellular morphological profiling in high-throughput microscopy screens by incorporating chemical compound structures during self-supervised pre-training, resulting in MICON significantly outperforming existing methods like CellProfiler and other deep-learning approaches in identifying reproducible drug effects across replicates and centers.
Recent advances in self-supervised deep learning have improved our ability to quantify cellular morphological changes in high-throughput microscopy screens, a process known as morphological profiling. However, most current methods only learn from images, despite many screens being inherently multimodal, as they involve both a chemical or genetic perturbation as well as an image-based readout. We hypothesized that incorporating chemical compound structure during self-supervised pre-training could improve learned representations of images in high-throughput microscopy screens. We introduce a representation learning framework, MICON (Molecular-Image Contrastive Learning), that models chemical compounds as treatments that induce counterfactual transformations of cell phenotypes. MICON significantly outperforms classical hand-crafted features such as CellProfiler and existing deep-learning-based representation learning methods in challenging evaluation settings where models must identify reproducible effects of drugs across independent replicates and data-generating centers. We demonstrate that incorporating chemical compound information into the learning process provides consistent improvements in our evaluation setting and that modeling compounds specifically as treatments in a causal framework outperforms approaches that directly align images and compounds in a single representation space. Our findings point to a new direction for representation learning in morphological profiling, suggesting that methods should explicitly account for the multimodal nature of microscopy screening data.