Kernel Attention Transformer (KAT) for Histopathology Whole Slide Image Classification
This work addresses computational bottlenecks in histopathology image analysis for medical diagnosis, representing an incremental improvement over existing methods.
The paper tackles the inefficiency of standard Transformers in processing gigapixel histopathology images by proposing a kernel attention Transformer (KAT), which uses cross-attention with positional kernels to improve context modeling and reduce computational complexity, achieving superior performance on gastric and endometrial datasets with 2040 and 2560 WSIs respectively.
Transformer has been widely used in histopathology whole slide image (WSI) classification for the purpose of tumor grading, prognosis analysis, etc. However, the design of token-wise self-attention and positional embedding strategy in the common Transformer limits the effectiveness and efficiency in the application to gigapixel histopathology images. In this paper, we propose a kernel attention Transformer (KAT) for histopathology WSI classification. The information transmission of the tokens is achieved by cross-attention between the tokens and a set of kernels related to a set of positional anchors on the WSI. Compared to the common Transformer structure, the proposed KAT can better describe the hierarchical context information of the local regions of the WSI and meanwhile maintains a lower computational complexity. The proposed method was evaluated on a gastric dataset with 2040 WSIs and an endometrial dataset with 2560 WSIs, and was compared with 6 state-of-the-art methods. The experimental results have demonstrated the proposed KAT is effective and efficient in the task of histopathology WSI classification and is superior to the state-of-the-art methods. The code is available at https://github.com/zhengyushan/kat.