CVAug 9, 2024

UNIC: Universal Classification Models via Multi-teacher Distillation

arXiv:2408.05088v122 citationsh-index: 35
Originality Incremental advance
AI Analysis

This work addresses the need for more versatile classification models in machine learning, though it is incremental as it builds on existing distillation techniques.

The paper tackles the problem of learning a single encoder that generalizes across multiple classification tasks by distilling knowledge from complementary pretrained models, resulting in a student model that matches or exceeds the performance of the best teacher for each task.

Pretrained models have become a commodity and offer strong results on a broad range of tasks. In this work, we focus on classification and seek to learn a unique encoder able to take from several complementary pretrained models. We aim at even stronger generalization across a variety of classification tasks. We propose to learn such an encoder via multi-teacher distillation. We first thoroughly analyse standard distillation when driven by multiple strong teachers with complementary strengths. Guided by this analysis, we gradually propose improvements to the basic distillation setup. Among those, we enrich the architecture of the encoder with a ladder of expendable projectors, which increases the impact of intermediate features during distillation, and we introduce teacher dropping, a regularization mechanism that better balances the teachers' influence. Our final distillation strategy leads to student models of the same capacity as any of the teachers, while retaining or improving upon the performance of the best teacher for each task. Project page and code: https://europe.naverlabs.com/unic

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes