High resolution weakly supervised localization architectures for medical images
This work addresses the need for more accurate weakly-supervised localization in medical imaging, where only image-level annotations are available, representing a novel method for a known bottleneck.
The paper tackled the problem of low localization accuracy in Class-Activation Maps (CAMs) for medical images by identifying task mismatch and issues with Global Average Pooling and Group Normalization, proposing the Pyramid Localization Network (PYLON) which achieved 0.62 average point localization accuracy on the Chest X-Ray 14 dataset, compared to 0.45 for traditional CAM.
In medical imaging, Class-Activation Map (CAM) serves as the main explainability tool by pointing to the region of interest. Since the localization accuracy from CAM is constrained by the resolution of the model's feature map, one may expect that segmentation models, which generally have large feature maps, would produce more accurate CAMs. However, we have found that this is not the case due to task mismatch. While segmentation models are developed for datasets with pixel-level annotation, only image-level annotation is available in most medical imaging datasets. Our experiments suggest that Global Average Pooling (GAP) and Group Normalization are the main culprits that worsen the localization accuracy of CAM. To address this issue, we propose Pyramid Localization Network (PYLON), a model for high-accuracy weakly-supervised localization that achieved 0.62 average point localization accuracy on NIH's Chest X-Ray 14 dataset, compared to 0.45 for a traditional CAM model. Source code and extended results are available at https://github.com/cmb-chula/pylon.