CVLGAug 16, 2023

SkinDistilViT: Lightweight Vision Transformer for Skin Lesion Classification

arXiv:2308.08669v125 citationsh-index: 13Has Code
Originality Incremental advance
AI Analysis

This provides a production-specific solution for early skin cancer detection, though it is incremental as it builds on existing knowledge distillation and vision transformer methods.

The paper tackles skin lesion classification by developing a lightweight vision transformer that matches human performance in melanoma identification, achieving 98.33% of the teacher model's accuracy while reducing memory by 49.60% and speeding up inference by up to 97.96%.

Skin cancer is a treatable disease if discovered early. We provide a production-specific solution to the skin cancer classification problem that matches human performance in melanoma identification by training a vision transformer on melanoma medical images annotated by experts. Since inference cost, both time and memory wise is important in practice, we employ knowledge distillation to obtain a model that retains 98.33% of the teacher's balanced multi-class accuracy, at a fraction of the cost. Memory-wise, our model is 49.60% smaller than the teacher. Time-wise, our solution is 69.25% faster on GPU and 97.96% faster on CPU. By adding classification heads at each level of the transformer and employing a cascading distillation process, we improve the balanced multi-class accuracy of the base model by 2.1%, while creating a range of models of various sizes but comparable performance. We provide the code at https://github.com/Longman-Stan/SkinDistilVit.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes