CVAILGJul 25, 2022

Black-box Few-shot Knowledge Distillation

arXiv:2207.12106v119 citationsh-index: 58Has Code
Originality Incremental advance
AI Analysis

This addresses the challenge of training efficient student models in real-world scenarios where labeled data is scarce and teacher parameters are inaccessible due to privacy concerns, representing an incremental improvement in few-shot knowledge distillation.

The paper tackles the problem of knowledge distillation with limited data and a black-box teacher by generating out-of-distribution synthetic images using MixUp and a conditional variational auto-encoder, achieving significant outperformance over recent state-of-the-art few/zero-shot KD methods on image classification tasks.

Knowledge distillation (KD) is an efficient approach to transfer the knowledge from a large "teacher" network to a smaller "student" network. Traditional KD methods require lots of labeled training samples and a white-box teacher (parameters are accessible) to train a good student. However, these resources are not always available in real-world applications. The distillation process often happens at an external party side where we do not have access to much data, and the teacher does not disclose its parameters due to security and privacy concerns. To overcome these challenges, we propose a black-box few-shot KD method to train the student with few unlabeled training samples and a black-box teacher. Our main idea is to expand the training set by generating a diverse set of out-of-distribution synthetic images using MixUp and a conditional variational auto-encoder. These synthetic images along with their labels obtained from the teacher are used to train the student. We conduct extensive experiments to show that our method significantly outperforms recent SOTA few/zero-shot KD methods on image classification tasks. The code and models are available at: https://github.com/nphdang/FS-BBT

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes