LGCYFeb 22, 2023

Debiased Distillation by Transplanting the Last Layer

arXiv:2302.11187v15 citationsh-index: 17
Originality Incremental advance
AI Analysis

This addresses the issue of spurious correlations in model compression for practitioners, though it is incremental as it builds on existing debiasing and distillation techniques.

The paper tackles the problem of biased student models in knowledge distillation due to biased training data, proposing DeTT which transplants the teacher's last layer and reweights samples to achieve debiasing, resulting in improved worst-group accuracy over baselines.

Deep models are susceptible to learning spurious correlations, even during the post-processing. We take a closer look at the knowledge distillation -- a popular post-processing technique for model compression -- and find that distilling with biased training data gives rise to a biased student, even when the teacher is debiased. To address this issue, we propose a simple knowledge distillation algorithm, coined DeTT (Debiasing by Teacher Transplanting). Inspired by a recent observation that the last neural net layer plays an overwhelmingly important role in debiasing, DeTT directly transplants the teacher's last layer to the student. Remaining layers are distilled by matching the feature map outputs of the student and the teacher, where the samples are reweighted to mitigate the dataset bias. Importantly, DeTT does not rely on the availability of extensive annotations on the bias-related attribute, which is typically not available during the post-processing phase. Throughout our experiments, DeTT successfully debiases the student model, consistently outperforming the baselines in terms of the worst-group accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes