CVSep 24, 2025

Efficient Cell Painting Image Representation Learning via Cross-Well Aligned Masked Siamese Network

Pin-Jui Huang, Yu-Hsuan Liao, SooHeon Kim, NoSeong Park, JongBae Park, DongMyung Shin

arXiv:2509.19896v11 citationsh-index: 1

Originality Highly original

AI Analysis

This addresses the problem of efficient phenotype modeling for drug discovery researchers, offering a more data- and parameter-efficient solution, though it is incremental as it builds on existing self-supervised and contrastive learning approaches.

The paper tackled the challenge of extracting biologically meaningful and batch-robust representations from cell painting images for drug discovery, and the result was that their CWA-MSN framework outperformed state-of-the-art methods by up to 29% in benchmark scores while using significantly less data and smaller models.

Computational models that predict cellular phenotypic responses to chemical and genetic perturbations can accelerate drug discovery by prioritizing therapeutic hypotheses and reducing costly wet-lab iteration. However, extracting biologically meaningful and batch-robust cell painting representations remains challenging. Conventional self-supervised and contrastive learning approaches often require a large-scale model and/or a huge amount of carefully curated data, still struggling with batch effects. We present Cross-Well Aligned Masked Siamese Network (CWA-MSN), a novel representation learning framework that aligns embeddings of cells subjected to the same perturbation across different wells, enforcing semantic consistency despite batch effects. Integrated into a masked siamese architecture, this alignment yields features that capture fine-grained morphology while remaining data- and parameter-efficient. For instance, in a gene-gene relationship retrieval benchmark, CWA-MSN outperforms the state-of-the-art publicly available self-supervised (OpenPhenom) and contrastive learning (CellCLIP) methods, improving the benchmark scores by +29\% and +9\%, respectively, while training on substantially fewer data (e.g., 0.2M images for CWA-MSN vs. 2.2M images for OpenPhenom) or smaller model size (e.g., 22M parameters for CWA-MSN vs. 1.48B parameters for CellCLIP). Extensive experiments demonstrate that CWA-MSN is a simple and effective way to learn cell image representation, enabling efficient phenotype modeling even under limited data and parameter budgets.

View on arXiv PDF

Similar