Covariance Descriptors Meet General Vision Encoders: Riemannian Deep Learning for Medical Image Classification
This work addresses medical image classification for healthcare applications, but it is incremental as it adapts existing covariance descriptor methods to new data and encoders.
The paper tackled the problem of improving medical image classification by combining covariance descriptors with pre-trained general vision encoders, showing that this approach consistently outperforms handcrafted descriptors and achieves superior performance with SPDNet on the MedMNSIT benchmark.
Covariance descriptors capture second-order statistics of image features. They have shown strong performance in general computer vision tasks, but remain underexplored in medical imaging. We investigate their effectiveness for both conventional and learning-based medical image classification, with a particular focus on SPDNet, a classification network specifically designed for symmetric positive definite (SPD) matrices. We propose constructing covariance descriptors from features extracted by pre-trained general vision encoders (GVEs) and comparing them with handcrafted descriptors. Two GVEs - DINOv2 and MedSAM - are evaluated across eleven binary and multi-class datasets from the MedMNSIT benchmark. Our results show that covariance descriptors derived from GVE features consistently outperform those derived from handcrafted features. Moreover, SPDNet yields superior performance to state-of-the-art methods when combined with DINOv2 features. Our findings highlight the potential of combining covariance descriptors with powerful pretrained vision encoders for medical image analysis.