CVJan 23, 2024

Convolutional Initialization for Data-Efficient Vision Transformers

arXiv:2401.12511v14 citationsh-index: 8
Originality Incremental advance
AI Analysis

This addresses the challenge of data efficiency for vision transformers, offering a solution for applications with limited labeled data, though it appears incremental by adapting existing CNN insights.

The paper tackled the problem of training vision transformers on small datasets by introducing a novel initialization strategy that reinterprets convolutional inductive bias as initialization bias, achieving comparable performance to CNNs while preserving architectural flexibility.

Training vision transformer networks on small datasets poses challenges. In contrast, convolutional neural networks (CNNs) can achieve state-of-the-art performance by leveraging their architectural inductive bias. In this paper, we investigate whether this inductive bias can be reinterpreted as an initialization bias within a vision transformer network. Our approach is motivated by the finding that random impulse filters can achieve almost comparable performance to learned filters in CNNs. We introduce a novel initialization strategy for transformer networks that can achieve comparable performance to CNNs on small datasets while preserving its architectural flexibility.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes