LGMMJun 11, 2022

Reducing Capacity Gap in Knowledge Distillation with Review Mechanism for Crowd Counting

arXiv:2206.05475v15 citationsh-index: 22
Originality Incremental advance
AI Analysis

This addresses the problem of limited student network performance due to capacity gaps in knowledge distillation for crowd counting, offering a plug-and-play solution that is incremental but effective.

The paper tackles the capacity gap issue in knowledge distillation for crowd counting by introducing a review mechanism, resulting in a model called ReviewKD that outperforms existing lightweight models and even surpasses the teacher network's performance.

The lightweight crowd counting models, in particular knowledge distillation (KD) based models, have attracted rising attention in recent years due to their superiority on computational efficiency and hardware requirement. However, existing KD based models usually suffer from the capacity gap issue, resulting in the performance of the student network being limited by the teacher network. In this paper, we address this issue by introducing a novel review mechanism following KD models, motivated by the review mechanism of human-beings during the study. Thus, the proposed model is dubbed ReviewKD. The proposed model consists of an instruction phase and a review phase, where we firstly exploit a well-trained heavy teacher network to transfer its latent feature to a lightweight student network in the instruction phase, then in the review phase yield a refined estimate of the density map based on the learned feature through a review mechanism. The effectiveness of ReviewKD is demonstrated by a set of experiments over six benchmark datasets via comparing to the state-of-the-art models. Numerical results show that ReviewKD outperforms existing lightweight models for crowd counting, and can effectively alleviate the capacity gap issue, and particularly has the performance beyond the teacher network. Besides the lightweight models, we also show that the suggested review mechanism can be used as a plug-and-play module to further boost the performance of a kind of heavy crowd counting models without modifying the neural network architecture and introducing any additional model parameter.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes