MetaDistiller: Network Self-Boosting via Meta-Learned Top-Down Distillation
This addresses computational resource demands and performance gaps in model compression for machine learning practitioners, though it appears incremental as it builds on existing distillation methods.
The paper tackles the inefficiency and compatibility issues in Knowledge Distillation by proposing MetaDistiller, which uses meta-learning to optimize a label generator for top-down feature fusion, achieving better self-boosting on CIFAR-100 and ILSVRC2012 benchmarks.
Knowledge Distillation (KD) has been one of the most popu-lar methods to learn a compact model. However, it still suffers from highdemand in time and computational resources caused by sequential train-ing pipeline. Furthermore, the soft targets from deeper models do notoften serve as good cues for the shallower models due to the gap of com-patibility. In this work, we consider these two problems at the same time.Specifically, we propose that better soft targets with higher compatibil-ity can be generated by using a label generator to fuse the feature mapsfrom deeper stages in a top-down manner, and we can employ the meta-learning technique to optimize this label generator. Utilizing the softtargets learned from the intermediate feature maps of the model, we canachieve better self-boosting of the network in comparison with the state-of-the-art. The experiments are conducted on two standard classificationbenchmarks, namely CIFAR-100 and ILSVRC2012. We test various net-work architectures to show the generalizability of our MetaDistiller. Theexperiments results on two datasets strongly demonstrate the effective-ness of our method.