CVLGSep 30, 2023

LIB-KD: Teaching Inductive Bias for Efficient Vision Transformer Distillation and Compression

arXiv:2310.00369v4h-index: 14
Originality Incremental advance
AI Analysis

This work addresses the problem of making ViTs practical for applications by improving training efficiency, though it is incremental as it builds on existing distillation techniques.

The paper tackles the challenge of training Vision Transformers (ViTs) efficiently by introducing LIB-KD, an ensemble-based distillation method that distills inductive biases from lightweight teacher models like convolution and involution, resulting in enhanced student performance and reduced computational burden through precomputed logits.

With the rapid development of computer vision, Vision Transformers (ViTs) offer the tantalising prospect of unified information processing across visual and textual domains due to the lack of inherent inductive biases in ViTs. ViTs require enormous datasets for training. We introduce an innovative ensemble-based distillation approach that distils inductive bias from complementary lightweight teacher models to make their applications practical. Prior systems relied solely on convolution-based teaching. However, this method incorporates an ensemble of light teachers with different architectural tendencies, such as convolution and involution, to jointly instruct the student transformer. Because of these unique inductive biases, instructors can accumulate a wide range of knowledge, even from readily identifiable stored datasets, which leads to enhanced student performance. Our proposed framework LIB-KD also involves precomputing and keeping logits in advance, essentially the unnormalized predictions of the model. This optimisation can accelerate the distillation process by eliminating the need for repeated forward passes during knowledge distillation, significantly reducing the computational burden and enhancing efficiency.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes