IV CVJul 15, 2025

Focus on Texture: Rethinking Pre-training in Masked Autoencoders for Medical Image Classification

Chetan Madan, Aarjav Satia, Soumen Basu, Pankaj Gupta, Usha Dutta, Chetan Arora

arXiv:2507.10869v18.61 citationsh-index: 37Has CodeMICCAI

Originality Incremental advance

AI Analysis

This work addresses a domain-specific problem for medical imaging by improving classification accuracy for diseases like cancer and pneumonia, though it is incremental as it adapts an existing MAE framework with a novel loss function.

The paper tackles the problem of masked autoencoders (MAEs) failing to preserve texture cues crucial for medical image classification by proposing GLCM-MAE, a pre-training framework that uses a loss based on Gray Level Co-occurrence Matrix (GLCM) to improve representation learning. It demonstrates performance gains, such as 3.1% for breast cancer detection and 0.6% for COVID detection, across multiple medical imaging tasks.

Masked Autoencoders (MAEs) have emerged as a dominant strategy for self-supervised representation learning in natural images, where models are pre-trained to reconstruct masked patches with a pixel-wise mean squared error (MSE) between original and reconstructed RGB values as the loss. We observe that MSE encourages blurred image re-construction, but still works for natural images as it preserves dominant edges. However, in medical imaging, when the texture cues are more important for classification of a visual abnormality, the strategy fails. Taking inspiration from Gray Level Co-occurrence Matrix (GLCM) feature in Radiomics studies, we propose a novel MAE based pre-training framework, GLCM-MAE, using reconstruction loss based on matching GLCM. GLCM captures intensity and spatial relationships in an image, hence proposed loss helps preserve morphological features. Further, we propose a novel formulation to convert matching GLCM matrices into a differentiable loss function. We demonstrate that unsupervised pre-training on medical images with the proposed GLCM loss improves representations for downstream tasks. GLCM-MAE outperforms the current state-of-the-art across four tasks - gallbladder cancer detection from ultrasound images by 2.1%, breast cancer detection from ultrasound by 3.1%, pneumonia detection from x-rays by 0.5%, and COVID detection from CT by 0.6%. Source code and pre-trained models are available at: https://github.com/ChetanMadan/GLCM-MAE.

View on arXiv PDF Code

Similar