CVCLGNJun 10, 2024

STimage-1K4M: A histopathology image-gene expression dataset for spatial transcriptomics

arXiv:2406.06393v231 citations
AI Analysis

This dataset enables advanced multi-modal analysis in computational pathology by providing unprecedented granularity for sub-tile regions.

The authors tackled the lack of detailed sub-tile genomic annotations in medical image-text datasets by introducing STimage-1K4M, a dataset with 1,149 images broken into 4,293,195 sub-tile and gene expression pairs, each with 15,000-30,000 dimensional genomic features.

Recent advances in multi-modal algorithms have driven and been driven by the increasing availability of large image-text datasets, leading to significant strides in various fields, including computational pathology. However, in most existing medical image-text datasets, the text typically provides high-level summaries that may not sufficiently describe sub-tile regions within a large pathology image. For example, an image might cover an extensive tissue area containing cancerous and healthy regions, but the accompanying text might only specify that this image is a cancer slide, lacking the nuanced details needed for in-depth analysis. In this study, we introduce STimage-1K4M, a novel dataset designed to bridge this gap by providing genomic features for sub-tile images. STimage-1K4M contains 1,149 images derived from spatial transcriptomics data, which captures gene expression information at the level of individual spatial spots within a pathology image. Specifically, each image in the dataset is broken down into smaller sub-image tiles, with each tile paired with 15,000-30,000 dimensional gene expressions. With 4,293,195 pairs of sub-tile images and gene expressions, STimage-1K4M offers unprecedented granularity, paving the way for a wide range of advanced research in multi-modal data analysis an innovative applications in computational pathology, and beyond.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes