IVAICVIRQMJan 29, 2025

Aggregation Schemes for Single-Vector WSI Representation Learning in Digital Pathology

arXiv:2501.17822v2h-index: 10
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of efficient WSI representation for digital pathology, but it is incremental as it benchmarks existing aggregation methods rather than introducing new ones.

The paper evaluated multiple aggregation techniques for generating single-vector representations from patch embeddings in Whole Slide Images (WSIs) to improve computational pathology, benchmarking methods like pooling, Deep Sets, and Fisher Vectors on TCGA datasets across bladder, breast, kidney, and colon sites.

A crucial step to efficiently integrate Whole Slide Images (WSIs) in computational pathology is assigning a single high-quality feature vector, i.e., one embedding, to each WSI. With the existence of many pre-trained deep neural networks and the emergence of foundation models, extracting embeddings for sub-images (i.e., tiles or patches) is straightforward. However, for WSIs, given their high resolution and gigapixel nature, inputting them into existing GPUs as a single image is not feasible. As a result, WSIs are usually split into many patches. Feeding each patch to a pre-trained model, each WSI can then be represented by a set of patches, hence, a set of embeddings. Hence, in such a setup, WSI representation learning reduces to set representation learning where for each WSI we have access to a set of patch embeddings. To obtain a single embedding from a set of patch embeddings for each WSI, multiple set-based learning schemes have been proposed in the literature. In this paper, we evaluate the WSI search performance of multiple recently developed aggregation techniques (mainly set representation learning techniques) including simple average or max pooling operations, Deep Sets, Memory networks, Focal attention, Gaussian Mixture Model (GMM) Fisher Vector, and deep sparse and binary Fisher Vector on four different primary sites including bladder, breast, kidney, and Colon from TCGA. Further, we benchmark the search performance of these methods against the median of minimum distances of patch embeddings, a non-aggregating approach used for WSI retrieval.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes