CVAug 9, 2022

How Well Do Vision Transformers (VTs) Transfer To The Non-Natural Image Domain? An Empirical Study Involving Art Classification

arXiv:2208.04693v18 citationsh-index: 8
Originality Synthesis-oriented
AI Analysis

This addresses the transfer learning capabilities of VTs for researchers and practitioners in computer vision, though it is incremental as it builds on existing VT and CNN comparisons.

The study investigated whether Vision Transformers (VTs) pre-trained on ImageNet can transfer effectively to non-natural image domains, specifically art classification, and found that VTs exhibit strong generalization and are more powerful feature extractors than CNNs.

Vision Transformers (VTs) are becoming a valuable alternative to Convolutional Neural Networks (CNNs) when it comes to problems involving high-dimensional and spatially organized inputs such as images. However, their Transfer Learning (TL) properties are not yet well studied, and it is not fully known whether these neural architectures can transfer across different domains as well as CNNs. In this paper we study whether VTs that are pre-trained on the popular ImageNet dataset learn representations that are transferable to the non-natural image domain. To do so we consider three well-studied art classification problems and use them as a surrogate for studying the TL potential of four popular VTs. Their performance is extensively compared against that of four common CNNs across several TL experiments. Our results show that VTs exhibit strong generalization properties and that these networks are more powerful feature extractors than CNNs.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes