Attention to detail: inter-resolution knowledge distillation
This work addresses computational limitations in digital pathology by enhancing model efficiency for gigapixel images, though it is incremental as it builds on existing knowledge distillation techniques.
The paper tackled the problem of performance degradation in computer vision models for gigapixel pathology images when using lower resolutions, by proposing an attention-based knowledge distillation method that improved model performance across resolutions, with substantial gains demonstrated in prostate histology grading experiments.
The development of computer vision solutions for gigapixel images in digital pathology is hampered by significant computational limitations due to the large size of whole slide images. In particular, digitizing biopsies at high resolutions is a time-consuming process, which is necessary due to the worsening results from the decrease in image detail. To alleviate this issue, recent literature has proposed using knowledge distillation to enhance the model performance at reduced image resolutions. In particular, soft labels and features extracted at the highest magnification level are distilled into a model that takes lower-magnification images as input. However, this approach fails to transfer knowledge about the most discriminative image regions in the classification process, which may be lost when the resolution is decreased. In this work, we propose to distill this information by incorporating attention maps during training. In particular, our formulation leverages saliency maps of the target class via grad-CAMs, which guides the lower-resolution Student model to match the Teacher distribution by minimizing the l2 distance between them. Comprehensive experiments on prostate histology image grading demonstrate that the proposed approach substantially improves the model performance across different image resolutions compared to previous literature.