CVNov 21, 2023

Long-MIL: Scaling Long Contextual Multiple Instance Learning for Histopathology Whole Slide Image Analysis

Honglin Li, Yunlong Zhang, Chenglu Zhu, Jiatong Cai, Sunyi Zheng, Lin Yang

arXiv:2311.12885v13.96 citationsh-index: 14

Originality Incremental advance

AI Analysis

This work addresses a practical problem in computer-aided cancer diagnosis by improving model robustness to varying WSI sizes, though it is incremental in adapting existing techniques to a specific domain.

The paper tackles the challenge of handling shape-varying whole slide images (WSIs) in histopathology by proposing Long-MIL, which introduces linear bias into attention for better position embedding extrapolation and uses Flash-Attention for computational efficiency. The method is evaluated on four datasets for classification and survival prediction tasks, showing superior performance on shape-varying WSIs.

Histopathology image analysis is the golden standard of clinical diagnosis for Cancers. In doctors daily routine and computer-aided diagnosis, the Whole Slide Image (WSI) of histopathology tissue is used for analysis. Because of the extremely large scale of resolution, previous methods generally divide the WSI into a large number of patches, then aggregate all patches within a WSI by Multi-Instance Learning (MIL) to make the slide-level prediction when developing computer-aided diagnosis tools. However, most previous WSI-MIL models using global-attention without pairwise interaction and any positional information, or self-attention with absolute position embedding can not well handle shape varying large WSIs, e.g. testing WSIs after model deployment may be larger than training WSIs, since the model development set is always limited due to the difficulty of histopathology WSIs collection. To deal with the problem, in this paper, we propose to amend position embedding for shape varying long-contextual WSI by introducing Linear Bias into Attention, and adapt it from 1-d long sequence into 2-d long-contextual WSI which helps model extrapolate position embedding to unseen or under-fitted positions. We further utilize Flash-Attention module to tackle the computational complexity of Transformer, which also keep full self-attention performance compared to previous attention approximation work. Our method, Long-contextual MIL (Long-MIL) are evaluated on extensive experiments including 4 dataset including WSI classification and survival prediction tasks to validate the superiority on shape varying WSIs. The source code will be open-accessed soon.

View on arXiv PDF

Similar