CVApr 27, 2021

Self-distillation with Batch Knowledge Ensembling Improves ImageNet Classification

arXiv:2104.13298v235 citations
Originality Incremental advance
AI Analysis

This work addresses efficiency in knowledge distillation for image classification, offering a lightweight method that surpasses single-network state-of-the-art benchmarks.

The paper tackles the computational cost of knowledge distillation by introducing BAtch Knowledge Ensembling (BAKE), which refines soft targets using inter-sample affinities within mini-batches, achieving a +0.7% gain on ImageNet with only +1.5% computational overhead and no extra parameters.

The recent studies of knowledge distillation have discovered that ensembling the "dark knowledge" from multiple teachers or students contributes to creating better soft targets for training, but at the cost of significantly more computations and/or parameters. In this work, we present BAtch Knowledge Ensembling (BAKE) to produce refined soft targets for anchor images by propagating and ensembling the knowledge of the other samples in the same mini-batch. Specifically, for each sample of interest, the propagation of knowledge is weighted in accordance with the inter-sample affinities, which are estimated on-the-fly with the current network. The propagated knowledge can then be ensembled to form a better soft target for distillation. In this way, our BAKE framework achieves online knowledge ensembling across multiple samples with only a single network. It requires minimal computational and memory overhead compared to existing knowledge ensembling methods. Extensive experiments demonstrate that the lightweight yet effective BAKE consistently boosts the classification performance of various architectures on multiple datasets, e.g., a significant +0.7% gain of Swin-T on ImageNet with only +1.5% computational overhead and zero additional parameters. BAKE does not only improve the vanilla baselines, but also surpasses the single-network state-of-the-arts on all the benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes