Feature Representation Transferring to Lightweight Models via Perception Coherence
This work addresses the challenge of compressing models for deployment on resource-constrained devices, offering an incremental improvement over existing knowledge distillation methods.
The paper tackles the problem of transferring feature representations from large teacher models to lightweight student models by introducing a new notion called perception coherence, which uses dissimilarity rankings to preserve global coherence without requiring absolute geometry preservation, and experiments show it outperforms or matches strong baselines.
In this paper, we propose a method for transferring feature representation to lightweight student models from larger teacher models. We mathematically define a new notion called \textit{perception coherence}. Based on this notion, we propose a loss function, which takes into account the dissimilarities between data points in feature space through their ranking. At a high level, by minimizing this loss function, the student model learns to mimic how the teacher model \textit{perceives} inputs. More precisely, our method is motivated by the fact that the representational capacity of the student model is weaker than the teacher model. Hence, we aim to develop a new method allowing for a better relaxation. This means that, the student model does not need to preserve the absolute geometry of the teacher one, while preserving global coherence through dissimilarity ranking. Importantly, while rankings are defined only on finite sets, our notion of \textit{perception coherence} extends them into a probabilistic form. This formulation depends on the input distribution and applies to general dissimilarity metrics. Our theoretical insights provide a probabilistic perspective on the process of feature representation transfer. Our experiments results show that our method outperforms or achieves on-par performance compared to strong baseline methods for representation transferring.