Multi-Scale Transformer Architecture for Accurate Medical Image Classification
This work addresses the need for more accurate diagnostic tools in medical imaging, specifically for skin lesion classification, but it is incremental as it builds on existing Transformer methods with refinements.
This study tackled the problem of improving accuracy and robustness in skin lesion classification by introducing an enhanced Transformer architecture with multi-scale feature fusion, achieving superior performance over established models like ResNet50 and Vision Transformer on the ISIC 2017 dataset across metrics such as accuracy and AUC.
This study introduces an AI-driven skin lesion classification algorithm built on an enhanced Transformer architecture, addressing the challenges of accuracy and robustness in medical image analysis. By integrating a multi-scale feature fusion mechanism and refining the self-attention process, the model effectively extracts both global and local features, enhancing its ability to detect lesions with ambiguous boundaries and intricate structures. Performance evaluation on the ISIC 2017 dataset demonstrates that the improved Transformer surpasses established AI models, including ResNet50, VGG19, ResNext, and Vision Transformer, across key metrics such as accuracy, AUC, F1-Score, and Precision. Grad-CAM visualizations further highlight the interpretability of the model, showcasing strong alignment between the algorithm's focus areas and actual lesion sites. This research underscores the transformative potential of advanced AI models in medical imaging, paving the way for more accurate and reliable diagnostic tools. Future work will explore the scalability of this approach to broader medical imaging tasks and investigate the integration of multimodal data to enhance AI-driven diagnostic frameworks for intelligent healthcare.