LG AIMay 18

Balancing Knowledge Distillation for Imbalance Learning with Bilevel Optimization

Anh B. H. Nguyen, Ba Tho Phan, Viet Cuong Ta

arXiv:2605.1783917.5

AI Analysis

For practitioners dealing with long-tailed classification, BiKD provides a more adaptive distillation approach that improves student performance without requiring manual tuning of loss weights.

BiKD introduces a bilevel optimization framework that dynamically generates per-sample weights for balancing hard and soft losses in knowledge distillation on imbalanced data, outperforming recent balanced distillation methods on long-tailed CIFAR-10/100 across various imbalance factors.

Knowledge distillation transfers knowledge from a high capacity teacher to a compact student using a mixture of hard and soft losses. On imbalanced data, a fixed weighting between hard and soft losses becomes brittle the learning process. Recent studies try to reweight these components in long-tailed settings. However, most of these meth- ods do not adapt weights at the sample-wise level and do not take into account the students behavior during training. To address this, we pro- pose BiKD - a bilevel framework that dynamically balances hard and soft losses for each sample. We employ a weight generation network that produces adaptive per-sample weights, guided by a small balanced vali- dation set. The student is now trained with an unconstrained combina- tion of weighted hard and soft losses, allowing the student to relax both terms. We further propose a multi-step SGD strategy to optimize the weight model more accurately and efficiently. Experiments on long-tailed CIFAR-10/100 show that our approach surpasses recent balanced distil- lation methods across imbalance factors.

View on arXiv PDF

Similar