LGCVApr 25, 2022

Selective Cross-Task Distillation

arXiv:2204.11526v32 citationsh-index: 40
Originality Incremental advance
AI Analysis

This addresses the challenge of leveraging diverse pre-trained models for knowledge distillation in machine learning, though it is incremental in improving efficiency and generalization.

The paper tackles the problem of efficiently selecting and reusing knowledge from pre-trained teachers for a new task via distillation, achieving effective cross-task knowledge transfer without exhaustive teacher evaluation.

The outpouring of various pre-trained models empowers knowledge distillation by providing abundant teacher resources, but there lacks a developed mechanism to utilize these teachers adequately. With a massive model repository composed of teachers pre-trained on diverse tasks, we must surmount two obstacles when using knowledge distillation to learn a new task. First, given a fixed computing budget, it is not affordable to try each teacher and train the student repeatedly, making it necessary to seek out the most contributive teacher precisely and efficiently. Second, semantic gaps exist between the teachers and the target student since they are trained on different tasks. Thus, we need to extract knowledge from a general label space that may be different from the student's. Faced with these two challenges, we study a new setting named selective cross-task distillation that includes teacher assessment and generalized knowledge reuse. We bridge the teacher's label space and the student's label space through optimal transport. The transportation cost from the teacher's prediction to the student's prediction measures the relatedness between two tasks and acts as an objective for distillation. Our method reuses cross-task knowledge from a distinct label space and efficiently assesses teachers without enumerating the model repository. Experiments demonstrate the effectiveness of our proposed method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes