CVSep 19, 2024

A Novel Perspective for Multi-modal Multi-label Skin Lesion Classification

Yuan Zhang, Yutong Xie, Hu Wang, Jodie C Avery, M Louise Hull, Gustavo Carneiro

arXiv:2409.12390v15.210 citationsh-index: 6

Originality Highly original

AI Analysis

This work addresses the problem of improving diagnostic accuracy for skin diseases using multi-modal data, which is incremental as it builds on existing deep learning methods with specific enhancements.

The paper tackled multi-modal multi-label skin lesion classification by introducing SkinM2Former, which uses a transformer-based model with novel modules for multi-modal fusion and multi-label correlation learning, achieving a mean average accuracy of 77.27% and mean diagnostic accuracy of 77.85% on the Derm7pt dataset, outperforming SOTA methods.

The efficacy of deep learning-based Computer-Aided Diagnosis (CAD) methods for skin diseases relies on analyzing multiple data modalities (i.e., clinical+dermoscopic images, and patient metadata) and addressing the challenges of multi-label classification. Current approaches tend to rely on limited multi-modal techniques and treat the multi-label problem as a multiple multi-class problem, overlooking issues related to imbalanced learning and multi-label correlation. This paper introduces the innovative Skin Lesion Classifier, utilizing a Multi-modal Multi-label TransFormer-based model (SkinM2Former). For multi-modal analysis, we introduce the Tri-Modal Cross-attention Transformer (TMCT) that fuses the three image and metadata modalities at various feature levels of a transformer encoder. For multi-label classification, we introduce a multi-head attention (MHA) module to learn multi-label correlations, complemented by an optimisation that handles multi-label and imbalanced learning problems. SkinM2Former achieves a mean average accuracy of 77.27% and a mean diagnostic accuracy of 77.85% on the public Derm7pt dataset, outperforming state-of-the-art (SOTA) methods.

View on arXiv PDF

Similar