CVQMNov 8, 2023

Weakly supervised cross-modal learning in high-content screening

arXiv:2311.04678v26 citationsh-index: 5
Originality Incremental advance
AI Analysis

This addresses the challenge of bridging image and molecular data for drug discovery in high-content screening, representing an incremental improvement with specific domain applications.

The paper tackles the problem of learning cross-modal representations between image data and molecular representations for drug discovery, introducing EMM and IMM loss functions that leverage weak supervision and cross-site replicates in High-Content Screening. The approach learns better representations and mitigates batch effects in cross-modal retrieval, while also presenting a preprocessing method that reduces the JUMP-CP dataset size from 85TB to 7TB while retaining most information.

With the surge in available data from various modalities, there is a growing need to bridge the gap between different data types. In this work, we introduce a novel approach to learn cross-modal representations between image data and molecular representations for drug discovery. We propose EMM and IMM, two innovative loss functions built on top of CLIP that leverage weak supervision and cross sites replicates in High-Content Screening. Evaluating our model against known baseline on cross-modal retrieval, we show that our proposed approach allows to learn better representations and mitigate batch effect. In addition, we also present a preprocessing method for the JUMP-CP dataset that effectively reduce the required space from 85Tb to a mere usable 7Tb size, still retaining all perturbations and most of the information content.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes