CVMar 26, 2025

Cross-Modal Prototype Allocation: Unsupervised Slide Representation Learning via Patch-Text Contrast in Computational Pathology

Yuxuan Chen, Jiawen Li, Jiali Hu, Xitong Ling, Tian Guan, Anjia Han, Yonghong He

arXiv:2503.20190v11 citationsh-index: 9BIBM

Originality Incremental advance

AI Analysis

This work addresses the need for more generalizable slide representations in computational pathology, though it is incremental as it builds on existing unsupervised methods by incorporating textual data.

The paper tackled the problem of limited generalizability in weakly supervised slide representation learning for computational pathology by proposing ProAlign, an unsupervised framework that uses patch-text contrast and prototype embeddings, achieving performance comparable to some weakly supervised models on four public datasets.

With the rapid advancement of pathology foundation models (FMs), the representation learning of whole slide images (WSIs) attracts increasing attention. Existing studies develop high-quality patch feature extractors and employ carefully designed aggregation schemes to derive slide-level representations. However, mainstream weakly supervised slide representation learning methods, primarily based on multiple instance learning (MIL), are tailored to specific downstream tasks, which limits their generalizability. To address this issue, some studies explore unsupervised slide representation learning. However, these approaches focus solely on the visual modality of patches, neglecting the rich semantic information embedded in textual data. In this work, we propose ProAlign, a cross-modal unsupervised slide representation learning framework. Specifically, we leverage a large language model (LLM) to generate descriptive text for the prototype types present in a WSI, introducing patch-text contrast to construct initial prototype embeddings. Furthermore, we propose a parameter-free attention aggregation strategy that utilizes the similarity between patches and these prototypes to form unsupervised slide embeddings applicable to a wide range of downstream tasks. Extensive experiments on four public datasets show that ProAlign outperforms existing unsupervised frameworks and achieves performance comparable to some weakly supervised models.

View on arXiv PDF

Similar