IVSep 6, 2024
A Short Survey on Set-Based Aggregation Techniques for Single-Vector WSI Representation in Digital PathologyS. Hemati, Krishna R. Kalari, H. R. Tizhoosh
Digital pathology is revolutionizing the field of pathology by enabling the digitization, storage, and analysis of tissue samples as whole slide images (WSIs). WSIs are gigapixel files that capture the intricate details of tissue samples, providing a rich source of information for diagnostic and research purposes. However, due to their enormous size, representing these images as one compact vector is essential for many computational pathology tasks, such as search and retrieval, to ensure efficiency and scalability. Most current methods are "patch-oriented," meaning they divide WSIs into smaller patches for processing, which prevents a holistic analysis of the entire slide. Additionally, the necessity for compact representation is driven by the expensive high-performance storage required for WSIs. Not all hospitals have access to such extensive storage solutions, leading to potential disparities in healthcare quality and accessibility. This paper provides an overview of existing set-based approaches to single-vector WSI representation, highlighting the innovations that allow for more efficient and effective use of these complex images in digital pathology, thus addressing both computational challenges and storage limitations.
15.5CVMay 22
CRISP -- Clustering-Based Redundancy-Reduced Instance Sampling for Pathology Case Representation and RetrievalZahra Rahimi Afzal, Wataru Uegami, Saghir Alfasly et al.
Digital pathology archives increasingly contain multiple whole-slide images (WSIs) per case, capturing spatially distinct tumour regions and reflecting intrinsic morphological heterogeneity. However, most existing approaches rely on a single pathologist-selected slide, thereby discarding potentially informative evidence distributed across the remaining WSIs. To date, no autonomous framework has been proposed for comprehensive multi-WSI case processing. Here, we present an unsupervised framework for case-level analysis that integrates information from all available slides within a case. Rather than relying on a single designated slide, the proposed approach constructs case-level representations by selectively distilling informative patches across WSIs. We introduce Clustering-Based Redundancy-Reduced Instance Sampling for Pathology (CRISP), a two-stage framework that first reduces redundancy within individual WSIs and subsequently applies clustering-based sampling to select a compact yet representative set of patches for the entire case. The resulting patch set captures case-level heterogeneity while avoiding exhaustive processing of gigapixel images, and directly serves as a retrieval index. Using two Mayo Clinic breast cancer datasets for diagnosis and treatment planning, we demonstrate that CRISP consistently matches or surpasses the current standard practice of combined model and pathologist slide selection for patient/case search and retrieval. By automating case-level processing and eliminating subjective WSI selection, CRISP potentially enables the exploitation of clinically relevant information distributed across multiple WSIs that is currently overlooked.
IVSep 19, 2024
Multimodal Learning for Scalable Representation of High-Dimensional Medical DataAreej Alsaafin, Abubakr Shafique, Saghir Alfasly et al.
Integrating artificial intelligence (AI) with healthcare data is rapidly transforming medical diagnostics and driving progress toward precision medicine. However, effectively leveraging multimodal data, particularly digital pathology whole slide images (WSIs) and genomic sequencing, remains a significant challenge due to the intrinsic heterogeneity of these modalities and the need for scalable and interpretable frameworks. Existing diagnostic models typically operate on unimodal data, overlooking critical cross-modal interactions that can yield richer clinical insights. We introduce MarbliX (Multimodal Association and Retrieval with Binary Latent Indexed matriX), a self-supervised framework that learns to embed WSIs and immunogenomic profiles into compact, scalable binary codes, termed ``monogram.'' By optimizing a triplet contrastive objective across modalities, MarbliX captures high-resolution patient similarity in a unified latent space, enabling efficient retrieval of clinically relevant cases and facilitating case-based reasoning. \textcolor{black}{In lung cancer, MarbliX achieves 85-89\% across all evaluation metrics, outperforming histopathology (69-71\%) and immunogenomics (73-76\%). In kidney cancer, real-valued monograms yield the strongest performance (F1: 80-83\%, Accuracy: 87-90\%), with binary monograms slightly lower (F1: 78-82\%).
12.0CVApr 28
Validation of Whole-Slide Foundation Models for Image Retrieval in TCGA DataTianhao Lei, Parsa Esmaeilkhani, Saghir Alfasly et al.
Foundation models are reshaping computational histopathology, yet their value for whole-slide image retrieval relative to strong patch-based and supervised aggregation baselines remains unclear. We benchmarked ten pipelines on 9,387 diagnostic slides spanning 17 organs and 60 diagnoses from The Cancer Genome Atlas (TCGA) using patient-level leave-one-patient-out evaluation. Methods included four pre-trained slide foundation models, a supervised attention-based multiple instance learning (ABMIL) aggregator on patch embeddings, and patch-level retrieval across five sampling densities. Performance varied more across organs and diagnoses than across architectures. Although the slide foundation model TITAN achieved the strongest overall results, its advantage was modest; ABMIL and patch-based methods reached comparable Top-1 and Top-3 accuracy, with no model consistently dominant. Morphologically distinctive entities approached ceiling performance, while rare, heterogeneous, and closely related subtypes remained challenging. Misclassifications aligned with organs exhibiting known inter-observer variability, suggesting an intrinsic ceiling for morphology-only retrieval. Performance was driven primarily by patch-level feature representations, with limited benefit from slide-level aggregation, indicating aggregation may be unnecessary in many settings. These findings argue against a universally optimal architecture and instead support organ-resolved benchmarking, diagnosis-aware or ensemble strategies, stronger feature representations, and multimodal retrieval frameworks. Notably, even the best model achieved only $\approx 68\% \pm 21\%$ retrieval accuracy on TCGA, and some subtypes showed $0\%$ accuracy across all methods, highlighting fundamental limitations of morphology-based representations and the need for substantial progress before reliable clinical deployment.