Normalized Feature Distillation for Semantic Segmentation
This work addresses model compression for semantic segmentation, offering an incremental improvement by simplifying feature distillation without manual design.
The paper tackles the problem of limited performance gains in knowledge distillation for semantic segmentation by proposing normalized feature distillation (NFD), which normalizes features to prevent magnitude imitation, achieving state-of-the-art results on Cityscapes, VOC 2012, and ADE20K datasets.
As a promising approach in model compression, knowledge distillation improves the performance of a compact model by transferring the knowledge from a cumbersome one. The kind of knowledge used to guide the training of the student is important. Previous distillation methods in semantic segmentation strive to extract various forms of knowledge from the features, which involve elaborate manual design relying on prior information and have limited performance gains. In this paper, we propose a simple yet effective feature distillation method called normalized feature distillation (NFD), aiming to enable effective distillation with the original features without the need to manually design new forms of knowledge. The key idea is to prevent the student from focusing on imitating the magnitude of the teacher's feature response by normalization. Our method achieves state-of-the-art distillation results for semantic segmentation on Cityscapes, VOC 2012, and ADE20K datasets. Code will be available.