CVAIDec 7, 2021

Bootstrapping ViTs: Towards Liberating Vision Transformers from Pre-training

arXiv:2112.03552v418 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses the computational burden of pre-training for vision Transformers, making them more practical for resource-constrained applications.

The paper tackles the problem of vision Transformers (ViTs) requiring large-scale pre-training to avoid overfitting on small datasets by introducing convolutional neural networks' inductive biases back into ViTs while preserving their architecture, resulting in ViTs converging significantly faster and outperforming conventional CNNs with fewer parameters on datasets like CIFAR-10/100 and ImageNet-1k.

Recently, vision Transformers (ViTs) are developing rapidly and starting to challenge the domination of convolutional neural networks (CNNs) in the realm of computer vision (CV). With the general-purpose Transformer architecture replacing the hard-coded inductive biases of convolution, ViTs have surpassed CNNs, especially in data-sufficient circumstances. However, ViTs are prone to over-fit on small datasets and thus rely on large-scale pre-training, which expends enormous time. In this paper, we strive to liberate ViTs from pre-training by introducing CNNs' inductive biases back to ViTs while preserving their network architectures for higher upper bound and setting up more suitable optimization objectives. To begin with, an agent CNN is designed based on the given ViT with inductive biases. Then a bootstrapping training algorithm is proposed to jointly optimize the agent and ViT with weight sharing, during which the ViT learns inductive biases from the intermediate features of the agent. Extensive experiments on CIFAR-10/100 and ImageNet-1k with limited training data have shown encouraging results that the inductive biases help ViTs converge significantly faster and outperform conventional CNNs with even fewer parameters. Our code is publicly available at https://github.com/zhfeing/Bootstrapping-ViTs-pytorch.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes