CVMay 25

CNNs, Transformers, Hybrid, and Vision Language Models for Skin Cancer Detection

arXiv:2605.262941.7
AI Analysis

Provides a reproducible benchmark and practical guidance for selecting model families for skin cancer screening, though the evaluation is limited to a single dataset.

This paper evaluates twelve deep learning models from four families (CNN, ViT, hybrid, VLM) on the PAD-UFES-20 dataset for binary skin cancer detection. Hybrid models (MaxViT Tiny, CoAtNet0) and a SigLIP-based VLM achieved the best trade-off between ranking performance and clinically relevant operating points.

Skin cancer is a common and fast rising malignancy worldwide. Early detection is critical for improving outcomes. Deep learning models trained on dermoscopic and clinical images can support automated and fast triage. However, many studies evaluate only a limited set of architectures. Experimental setups also vary across studies. In this paper, we present a unified evaluation of twelve deep learning models for binary skin cancer detection on the PAD-UFES-20 dataset. The models span four families: convolutional neural networks (CNN), vision transformers (ViT), hybrid convolution transformer backbones, and vision language models (VLM). Performance is assessed using AUC, the maximum F1 score with its precision and recall, and sensitivity at 80% specificity, reflecting screening oriented requirements. Our results show that well tuned CNNs already provide strong baselines, but transformer based families consistently improve discrimination. Hybrid models (MaxViT Tiny, CoAtNet0) and a SigLIP based VLM achieve the best overall trade off between ranking performance and clinically relevant operating points, while CLIP based model offers high precision. The full codebase for all experiments is publicly released. Together, these findings offer practical guidance on which model families are most suitable for real world deployment in skin cancer screening and establish a reproducible reference point for future work on PAD-UFES-20.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes