CVMay 13, 2025

CNN and ViT Efficiency Study on Tiny ImageNet and DermaMNIST Datasets

Aidar Amangeldi, Angsar Taigonyrov, Muhammad Huzaid Jawad, Chinedu Emmanuel Mbonu

arXiv:2505.08259v111 citationsh-index: 3

Originality Synthesis-oriented

AI Analysis

This addresses efficiency trade-offs for image classification in resource-constrained environments, but is incremental as it applies existing methods to new data.

This study evaluated convolutional vs. transformer architectures on medical and general image datasets, finding that fine-tuned Vision Transformers could match baseline performance while achieving faster inference and fewer parameters.

This study evaluates the trade-offs between convolutional and transformer-based architectures on both medical and general-purpose image classification benchmarks. We use ResNet-18 as our baseline and introduce a fine-tuning strategy applied to four Vision Transformer variants (Tiny, Small, Base, Large) on DermatologyMNIST and TinyImageNet. Our goal is to reduce inference latency and model complexity with acceptable accuracy degradation. Through systematic hyperparameter variations, we demonstrate that appropriately fine-tuned Vision Transformers can match or exceed the baseline's performance, achieve faster inference, and operate with fewer parameters, highlighting their viability for deployment in resource-constrained environments.

View on arXiv PDF

Similar