CVAug 9, 2022

How Well Do Vision Transformers (VTs) Transfer To The Non-Natural Image Domain? An Empirical Study Involving Art Classification

arXiv:2208.04693v14.88 citationsh-index: 8Has Code

Originality Synthesis-oriented

AI Analysis

This addresses the transfer learning capabilities of VTs for researchers and practitioners in computer vision, though it is incremental as it builds on existing VT and CNN comparisons.

The study investigated whether Vision Transformers (VTs) pre-trained on ImageNet can transfer effectively to non-natural image domains, specifically art classification, and found that VTs exhibit strong generalization and are more powerful feature extractors than CNNs.

Vision Transformers (VTs) are becoming a valuable alternative to Convolutional Neural Networks (CNNs) when it comes to problems involving high-dimensional and spatially organized inputs such as images. However, their Transfer Learning (TL) properties are not yet well studied, and it is not fully known whether these neural architectures can transfer across different domains as well as CNNs. In this paper we study whether VTs that are pre-trained on the popular ImageNet dataset learn representations that are transferable to the non-natural image domain. To do so we consider three well-studied art classification problems and use them as a surrogate for studying the TL potential of four popular VTs. Their performance is extensively compared against that of four common CNNs across several TL experiments. Our results show that VTs exhibit strong generalization properties and that these networks are more powerful feature extractors than CNNs.

View on arXiv PDF Code

Similar